[jira] [Commented] (LUCENE-6819) Deprecate index-time boosts?

2017-03-03 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15894994#comment-15894994
 ] 

Adrien Grand commented on LUCENE-6819:
--

Thanks Steve, I just pushed a fix. Sorry for the noise.

> Deprecate index-time boosts?
> 
>
> Key: LUCENE-6819
> URL: https://issues.apache.org/jira/browse/LUCENE-6819
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: master (7.0), 6.5
>
> Attachments: LUCENE-6819-deprecation.patch, 
> LUCENE-6819-deprecation.patch, LUCENE-6819-deprecation.patch, 
> LUCENE-6819.patch, LUCENE-6819.patch, LUCENE-6819-wip.patch
>
>
> Follow-up of this comment: 
> https://issues.apache.org/jira/browse/LUCENE-6818?focusedCommentId=14934801=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14934801
> Index-time boosts are a very expert feature whose behaviour is tight to the 
> Similarity impl. Additionally users have often be confused by the poor 
> precision due to the fact that we encode values on a single byte. But now we 
> have doc values that allow you to encode any values the way you want with as 
> much precision as you need so maybe we should deprecate index-time boosts and 
> recommend to encode index-time scoring factors into doc values fields instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6819) Deprecate index-time boosts?

2017-03-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15894995#comment-15894995
 ] 

ASF subversion and git services commented on LUCENE-6819:
-

Commit 7453f78b3539c7f4f5fa6e5324b251467ca50644 in lucene-solr's branch 
refs/heads/master from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7453f78 ]

LUCENE-6819: Make ExtractingRequestHandlerTest not rely on index-time boosts.


> Deprecate index-time boosts?
> 
>
> Key: LUCENE-6819
> URL: https://issues.apache.org/jira/browse/LUCENE-6819
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: master (7.0), 6.5
>
> Attachments: LUCENE-6819-deprecation.patch, 
> LUCENE-6819-deprecation.patch, LUCENE-6819-deprecation.patch, 
> LUCENE-6819.patch, LUCENE-6819.patch, LUCENE-6819-wip.patch
>
>
> Follow-up of this comment: 
> https://issues.apache.org/jira/browse/LUCENE-6818?focusedCommentId=14934801=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14934801
> Index-time boosts are a very expert feature whose behaviour is tight to the 
> Similarity impl. Additionally users have often be confused by the poor 
> precision due to the fact that we encode values on a single byte. But now we 
> have doc values that allow you to encode any values the way you want with as 
> much precision as you need so maybe we should deprecate index-time boosts and 
> recommend to encode index-time scoring factors into doc values fields instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6819) Deprecate index-time boosts?

2017-03-03 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15894968#comment-15894968
 ] 

Steve Rowe commented on LUCENE-6819:


{{ExtractingRequestHandlerTest.testExtraction()}} has been failing since the 
master commit on this issue 
[https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8ed2b76], e.g. 
[http://jenkins.sarowe.net/job/Lucene-Solr-tests-master/10364/]:

{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=ExtractingRequestHandlerTest -Dtests.method=testExtraction 
-Dtests.seed=F122EB76205DC618 -Dtests.slow=true -Dtests.locale=es-HN 
-Dtests.timezone=America/Inuvik -Dtests.asserts=true 
-Dtests.file.encoding=ISO-8859-1
   [junit4] ERROR   0.12s J1 | ExtractingRequestHandlerTest.testExtraction <<<
   [junit4]> Throwable #1: java.lang.RuntimeException: Exception during 
query
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([F122EB76205DC618:48519F085C7516ED]:0)
   [junit4]>at 
org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:919)
   [junit4]>at 
org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:886)
   [junit4]>at 
org.apache.solr.handler.extraction.ExtractingRequestHandlerTest.testExtraction(ExtractingRequestHandlerTest.java:128)
   [junit4]>at java.lang.Thread.run(Thread.java:745)
   [junit4]> Caused by: java.lang.RuntimeException: REQUEST FAILED: 
xpath=//doc[1]/str[.='simple3']
   [junit4]>xml response was: 
   [junit4]> 
   [junit4]> 00stream_size365X-Parsed-Byorg.apache.tika.parser.DefaultParserX-Parsed-Byorg.apache.tika.parser.html.HtmlParserstream_content_typeapplication/xmlstream_namesimple.htmlstream_source_infofile:/var/lib/jenkins/jobs/Lucene-Solr-tests-master/workspace/solr/contrib/extraction/src/test-files/extraction/simple.htmldc:titleWelcome
 to 
SolrContent-EncodingISO-8859-1Content-Typetext/html;
 charset=ISO-8859-1recthttp://www.apache.orgsimple2365org.apache.tika.parser.DefaultParserorg.apache.tika.parser.html.HtmlParserapplication/xmlsimple.htmlfile:/var/lib/jenkins/jobs/Lucene-Solr-tests-master/workspace/solr/contrib/extraction/src/test-files/extraction/simple.htmlWelcome to SolrISO-8859-1Welcome to Solrtext/html; charset=ISO-8859-1 
   [junit4]>  
   [junit4]>   
   [junit4]>   
   [junit4]>   
   [junit4]>   
   [junit4]>   
   [junit4]>   
   [junit4]>   
   [junit4]>   
   [junit4]>   
   [junit4]>  Welcome to Solr 
   [junit4]>  
   [junit4]>  
   [junit4]>  
   [junit4]>   Here is some text
   [junit4]>  
   [junit4]>  distinct
   [junit4]> words 
   [junit4]>  Here is some text in a div 
   [junit4]>  This has a  link . 
   [junit4]>   muLti-Default422017-03-03T09:02:52.622Zstream_size365X-Parsed-Byorg.apache.tika.parser.DefaultParserX-Parsed-Byorg.apache.tika.parser.html.HtmlParserstream_content_typeapplication/xmlstream_namesimple.htmlstream_source_infofile:/var/lib/jenkins/jobs/Lucene-Solr-tests-master/workspace/solr/contrib/extraction/src/test-files/extraction/simple.htmldc:titleWelcome
 to 
SolrContent-EncodingISO-8859-1Content-Typetext/html;
 charset=ISO-8859-1recthttp://www.apache.orgsimple3365org.apache.tika.parser.DefaultParserorg.apache.tika.parser.html.HtmlParserapplication/xmlsimple.htmlfile:/var/lib/jenkins/jobs/Lucene-Solr-tests-master/workspace/solr/contrib/extraction/src/test-files/extraction/simple.htmlWelcome to SolrISO-8859-1Welcome to Solrtext/html; charset=ISO-8859-1 
   [junit4]>  
   [junit4]>   
   [junit4]>   
   [junit4]>   
   [junit4]>   
   [junit4]>   
   [junit4]>   
   [junit4]>   
   [junit4]>   
   [junit4]>   
   [junit4]>  Welcome to Solr 
   [junit4]>  
   [junit4]>  
   [junit4]>  
   [junit4]>   Here is some text
   [junit4]>  
   [junit4]>  distinct
   [junit4]> words 
   [junit4]>  Here is some text in a div 
   [junit4]>  This has a  link . 
   [junit4]>   muLti-Default422017-03-03T09:02:52.660Z
   [junit4]> 
   [junit4]>request 
was:q=t_href:http=standard=0=20=2.2
   [junit4]>at 
org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:912)
[...]
   [junit4]   2> NOTE: test params are: 
codec=FastCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=FAST,
 chunkSize=6, maxDocsPerChunk=8, blockSize=301), 
termVectorsFormat=CompressingTermVectorsFormat(compressionMode=FAST, 
chunkSize=6, blockSize=301)), sim=RandomSimilarity(queryNorm=true): {}, 
locale=es-HN, timezone=America/Inuvik
   [junit4]   2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 
1.8.0_77 (64-bit)/cpus=16,threads=1,free=423346232,total=485490688
{noformat}

I got 10/10 failures when I beasted the test using Miller's beasting script on 
master HEAD.

> Deprecate index-time boosts?
> 

[jira] [Commented] (LUCENE-6819) Deprecate index-time boosts?

2017-03-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892749#comment-15892749
 ] 

ASF subversion and git services commented on LUCENE-6819:
-

Commit 8ed2b764ed4d4d5203b5df1e16fdc1ffd640322c in lucene-solr's branch 
refs/heads/master from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8ed2b76 ]

LUCENE-6819: Remove index-time boosts.


> Deprecate index-time boosts?
> 
>
> Key: LUCENE-6819
> URL: https://issues.apache.org/jira/browse/LUCENE-6819
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6819-deprecation.patch, 
> LUCENE-6819-deprecation.patch, LUCENE-6819-deprecation.patch, 
> LUCENE-6819.patch, LUCENE-6819.patch, LUCENE-6819-wip.patch
>
>
> Follow-up of this comment: 
> https://issues.apache.org/jira/browse/LUCENE-6818?focusedCommentId=14934801=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14934801
> Index-time boosts are a very expert feature whose behaviour is tight to the 
> Similarity impl. Additionally users have often be confused by the poor 
> precision due to the fact that we encode values on a single byte. But now we 
> have doc values that allow you to encode any values the way you want with as 
> much precision as you need so maybe we should deprecate index-time boosts and 
> recommend to encode index-time scoring factors into doc values fields instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6819) Deprecate index-time boosts?

2017-03-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892747#comment-15892747
 ] 

ASF subversion and git services commented on LUCENE-6819:
-

Commit 5f8a6dfff65599d961b99c6ff03b70b79e2ccaf4 in lucene-solr's branch 
refs/heads/branch_6x from [~jpountz]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5f8a6df ]

LUCENE-6819: Deprecate index-time boosts.


> Deprecate index-time boosts?
> 
>
> Key: LUCENE-6819
> URL: https://issues.apache.org/jira/browse/LUCENE-6819
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6819-deprecation.patch, 
> LUCENE-6819-deprecation.patch, LUCENE-6819-deprecation.patch, 
> LUCENE-6819.patch, LUCENE-6819.patch, LUCENE-6819-wip.patch
>
>
> Follow-up of this comment: 
> https://issues.apache.org/jira/browse/LUCENE-6818?focusedCommentId=14934801=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14934801
> Index-time boosts are a very expert feature whose behaviour is tight to the 
> Similarity impl. Additionally users have often be confused by the poor 
> precision due to the fact that we encode values on a single byte. But now we 
> have doc values that allow you to encode any values the way you want with as 
> much precision as you need so maybe we should deprecate index-time boosts and 
> recommend to encode index-time scoring factors into doc values fields instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6819) Deprecate index-time boosts?

2017-03-01 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890496#comment-15890496
 ] 

Uwe Schindler commented on LUCENE-6819:
---

Atomic part is fine. When proposing not to use atomics I did not notice that 
the getAndSet does not happen for the common case (without boost), so there is 
no performance issue.

> Deprecate index-time boosts?
> 
>
> Key: LUCENE-6819
> URL: https://issues.apache.org/jira/browse/LUCENE-6819
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6819-deprecation.patch, 
> LUCENE-6819-deprecation.patch, LUCENE-6819.patch, LUCENE-6819.patch, 
> LUCENE-6819-wip.patch
>
>
> Follow-up of this comment: 
> https://issues.apache.org/jira/browse/LUCENE-6818?focusedCommentId=14934801=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14934801
> Index-time boosts are a very expert feature whose behaviour is tight to the 
> Similarity impl. Additionally users have often be confused by the poor 
> precision due to the fact that we encode values on a single byte. But now we 
> have doc values that allow you to encode any values the way you want with as 
> much precision as you need so maybe we should deprecate index-time boosts and 
> recommend to encode index-time scoring factors into doc values fields instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6819) Deprecate index-time boosts?

2017-03-01 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890472#comment-15890472
 ] 

Jan Høydahl commented on LUCENE-6819:
-

I like the Atomic part and alternating between WARN on first-time and DEBUG 
thereafter!

Should the deprecation patch have a solr/CHANGES.txt entry even if it is a 
LUCENE issue? Well, it's a hybrid issue, and we have explicitly referenced 
LUCENE issues in solr CHANGES earlier.

> Deprecate index-time boosts?
> 
>
> Key: LUCENE-6819
> URL: https://issues.apache.org/jira/browse/LUCENE-6819
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6819-deprecation.patch, 
> LUCENE-6819-deprecation.patch, LUCENE-6819.patch, LUCENE-6819.patch, 
> LUCENE-6819-wip.patch
>
>
> Follow-up of this comment: 
> https://issues.apache.org/jira/browse/LUCENE-6818?focusedCommentId=14934801=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14934801
> Index-time boosts are a very expert feature whose behaviour is tight to the 
> Similarity impl. Additionally users have often be confused by the poor 
> precision due to the fact that we encode values on a single byte. But now we 
> have doc values that allow you to encode any values the way you want with as 
> much precision as you need so maybe we should deprecate index-time boosts and 
> recommend to encode index-time scoring factors into doc values fields instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6819) Deprecate index-time boosts?

2017-03-01 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890337#comment-15890337
 ] 

Uwe Schindler commented on LUCENE-6819:
---

"correct" way would be to use an AtomicBoolean, but a conventional, static 
boolean next to the LOGGER declaration should be fine, too. On concurrent use, 
you may get mutliple warnings for a short time until cache flush, but who cares?

> Deprecate index-time boosts?
> 
>
> Key: LUCENE-6819
> URL: https://issues.apache.org/jira/browse/LUCENE-6819
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6819.patch, LUCENE-6819-wip.patch
>
>
> Follow-up of this comment: 
> https://issues.apache.org/jira/browse/LUCENE-6818?focusedCommentId=14934801=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14934801
> Index-time boosts are a very expert feature whose behaviour is tight to the 
> Similarity impl. Additionally users have often be confused by the poor 
> precision due to the fact that we encode values on a single byte. But now we 
> have doc values that allow you to encode any values the way you want with as 
> much precision as you need so maybe we should deprecate index-time boosts and 
> recommend to encode index-time scoring factors into doc values fields instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6819) Deprecate index-time boosts?

2017-03-01 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890296#comment-15890296
 ] 

Adrien Grand commented on LUCENE-6819:
--

Sure, I can attach a patch for the deprecation separately.

bq.  perhaps that should be on WARN level but log only the first occurrence?

Is there an easy way to do this?

> Deprecate index-time boosts?
> 
>
> Key: LUCENE-6819
> URL: https://issues.apache.org/jira/browse/LUCENE-6819
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6819.patch, LUCENE-6819-wip.patch
>
>
> Follow-up of this comment: 
> https://issues.apache.org/jira/browse/LUCENE-6818?focusedCommentId=14934801=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14934801
> Index-time boosts are a very expert feature whose behaviour is tight to the 
> Similarity impl. Additionally users have often be confused by the poor 
> precision due to the fact that we encode values on a single byte. But now we 
> have doc values that allow you to encode any values the way you want with as 
> much precision as you need so maybe we should deprecate index-time boosts and 
> recommend to encode index-time scoring factors into doc values fields instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6819) Deprecate index-time boosts?

2017-03-01 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890235#comment-15890235
 ] 

Jan Høydahl commented on LUCENE-6819:
-

Wow, there is a lot of Solr code handling index time boosts :-)
Will you attach a separate 6x deprecation patch?

Deprecating the relevant solrj methods sounds fine. And for the deprecation 
warning, perhaps that should be on WARN level but log only the first 
occurrence? That way people would spot this in production on 6.5, even if they 
did not fine-read the release notes

> Deprecate index-time boosts?
> 
>
> Key: LUCENE-6819
> URL: https://issues.apache.org/jira/browse/LUCENE-6819
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6819.patch, LUCENE-6819-wip.patch
>
>
> Follow-up of this comment: 
> https://issues.apache.org/jira/browse/LUCENE-6818?focusedCommentId=14934801=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14934801
> Index-time boosts are a very expert feature whose behaviour is tight to the 
> Similarity impl. Additionally users have often be confused by the poor 
> precision due to the fact that we encode values on a single byte. But now we 
> have doc values that allow you to encode any values the way you want with as 
> much precision as you need so maybe we should deprecate index-time boosts and 
> recommend to encode index-time scoring factors into doc values fields instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6819) Deprecate index-time boosts?

2017-03-01 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890224#comment-15890224
 ] 

Yonik Seeley commented on LUCENE-6819:
--

bq.  It makes Solr ignore index-time boosts, and log a debug message when a 
value that is different from 1.0 is supplied, in order to not break hard for 
users who would go to 7.0 without making sure to never set boosts.

+1, I think this is the right approach.


> Deprecate index-time boosts?
> 
>
> Key: LUCENE-6819
> URL: https://issues.apache.org/jira/browse/LUCENE-6819
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6819.patch, LUCENE-6819-wip.patch
>
>
> Follow-up of this comment: 
> https://issues.apache.org/jira/browse/LUCENE-6818?focusedCommentId=14934801=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14934801
> Index-time boosts are a very expert feature whose behaviour is tight to the 
> Similarity impl. Additionally users have often be confused by the poor 
> precision due to the fact that we encode values on a single byte. But now we 
> have doc values that allow you to encode any values the way you want with as 
> much precision as you need so maybe we should deprecate index-time boosts and 
> recommend to encode index-time scoring factors into doc values fields instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6819) Deprecate index-time boosts?

2017-03-01 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889811#comment-15889811
 ] 

Jan Høydahl commented on LUCENE-6819:
-

If removal is plannend for 7.0, we must make sure at least one minor 6.x 
release contains deprecation and documentation heads-up, also for Solr, else 
I'll be -1 on committing this.

> Deprecate index-time boosts?
> 
>
> Key: LUCENE-6819
> URL: https://issues.apache.org/jira/browse/LUCENE-6819
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6819-wip.patch
>
>
> Follow-up of this comment: 
> https://issues.apache.org/jira/browse/LUCENE-6818?focusedCommentId=14934801=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14934801
> Index-time boosts are a very expert feature whose behaviour is tight to the 
> Similarity impl. Additionally users have often be confused by the poor 
> precision due to the fact that we encode values on a single byte. But now we 
> have doc values that allow you to encode any values the way you want with as 
> much precision as you need so maybe we should deprecate index-time boosts and 
> recommend to encode index-time scoring factors into doc values fields instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6819) Deprecate index-time boosts?

2017-03-01 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889768#comment-15889768
 ] 

Jan Høydahl commented on LUCENE-6819:
-

Index time boosts are documented in 
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-AddingDocuments
 and added like this
{code:xml}

  bar

{code}

I won't be sorry if they disappear, never recommend their usage anyway :)

> Deprecate index-time boosts?
> 
>
> Key: LUCENE-6819
> URL: https://issues.apache.org/jira/browse/LUCENE-6819
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6819-wip.patch
>
>
> Follow-up of this comment: 
> https://issues.apache.org/jira/browse/LUCENE-6818?focusedCommentId=14934801=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14934801
> Index-time boosts are a very expert feature whose behaviour is tight to the 
> Similarity impl. Additionally users have often be confused by the poor 
> precision due to the fact that we encode values on a single byte. But now we 
> have doc values that allow you to encode any values the way you want with as 
> much precision as you need so maybe we should deprecate index-time boosts and 
> recommend to encode index-time scoring factors into doc values fields instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6819) Deprecate index-time boosts?

2017-02-28 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888561#comment-15888561
 ] 

Uwe Schindler commented on LUCENE-6819:
---

+1 to remove index time boost. I always recommend to user to add doc values 
fields and use a function query (its just wrapping the query, very easy 
anyways!). About Solr users: I don't even know if it is at all possible with 
Solr to add index time boosts?

> Deprecate index-time boosts?
> 
>
> Key: LUCENE-6819
> URL: https://issues.apache.org/jira/browse/LUCENE-6819
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6819-wip.patch
>
>
> Follow-up of this comment: 
> https://issues.apache.org/jira/browse/LUCENE-6818?focusedCommentId=14934801=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14934801
> Index-time boosts are a very expert feature whose behaviour is tight to the 
> Similarity impl. Additionally users have often be confused by the poor 
> precision due to the fact that we encode values on a single byte. But now we 
> have doc values that allow you to encode any values the way you want with as 
> much precision as you need so maybe we should deprecate index-time boosts and 
> recommend to encode index-time scoring factors into doc values fields instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6819) Deprecate index-time boosts?

2017-02-21 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876549#comment-15876549
 ] 

David Smiley commented on LUCENE-6819:
--

I get your point. It's a shame that the particular use of the bits right now 
was decided to have both 3 terms and 4 terms produce the same norm when, IMO, 
there should be more fidelity for for them for the same reason you mentioned.  
Maybe this specifically could be rectified instead of removal of index time 
boosts? 

Perhaps index time boosts support should be moved to the codec {{NormsFormat}} 
which could have a method to declare wether it supports index time boosts or 
not? ? i.e. we don't support it by default and if you want index time boosts 
then you must do something to enable it?

On the other hand, I appreciate that removing this feature would be the 
simplest route to take and reduce overall complexity in Lucene.  And it's not 
like index time boosts is a must-have; users can emulate it, albeit with some 
work.  Maybe that could be made easier... hmmm.

Any way; I'm not standing in your way. I'm curious what others think.

> Deprecate index-time boosts?
> 
>
> Key: LUCENE-6819
> URL: https://issues.apache.org/jira/browse/LUCENE-6819
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>
> Follow-up of this comment: 
> https://issues.apache.org/jira/browse/LUCENE-6818?focusedCommentId=14934801=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14934801
> Index-time boosts are a very expert feature whose behaviour is tight to the 
> Similarity impl. Additionally users have often be confused by the poor 
> precision due to the fact that we encode values on a single byte. But now we 
> have doc values that allow you to encode any values the way you want with as 
> much precision as you need so maybe we should deprecate index-time boosts and 
> recommend to encode index-time scoring factors into doc values fields instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6819) Deprecate index-time boosts?

2017-02-16 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870559#comment-15870559
 ] 

Adrien Grand commented on LUCENE-6819:
--

I agree index-time and search-time boosting have different trade-offs that may 
both be interesting. The problem I have is that supporting index-time boosts 
means that length norm is less accurate for _everyone_. Right now if you do not 
use index-time boosts, which I think is the case for a majority of users, you 
end up with a length norm that is between 0 and 1 ({{1/sqrt(fieldLen)}}). The 
length norm may only be greater than 1 if you use a boost that is greater than 
1. Out of the 256 values that {{SmallFloat.byte315ToFloat}} supports, only 125 
of them are less than or equal to 1, the other 131 values are all greater than 
1. Said otherwise, more than half the norm values we support are wasted if you 
do not use index-time boosts.

If instead we could assume that norms were always between 0 and 1, we could 
take one bit from the exponent and spend it on the mantissa instead to improve 
accuracy. For instance I rebuilt the table that had been built for LUCENE-5005 
and expanded it with a couple more length values, as well as what the rounded 
norm would be if we spent 1 more bit on the mantissa (while still being able to 
encode the norm on a single byte, see the float415 column):

||numTerms||1/sqrt(numTerms)||1/sqrt(numTerms) to float315||1/sqrt(numTerms) to 
float415||
| 1 | 1.0 | 1.0 | 1.0 |
| 2 | 0.70710677 | 0.625 | 0.6875 |
| 3 | 0.57735026 | 0.5 | 0.5625 |
| 4 | 0.5 | 0.5 | 0.5 |
| 5 | 0.4472136 | 0.4375 | 0.4375 |
| 6 | 0.4082483 | 0.375 | 0.40625 |
| 7 | 0.37796447 | 0.375 | 0.375 |
| 8 | 0.35355338 | 0.3125 | 0.34375 |
| 9 | 0.3334 | 0.3125 | 0.3125 |
| 10 | 0.31622776 | 0.3125 | 0.3125 |
| 11 | 0.30151135 | 0.25 | 0.28125 |
| 12 | 0.28867513 | 0.25 | 0.28125 |
| 13 | 0.2773501 | 0.25 | 0.25 |
| 14 | 0.26726124 | 0.25 | 0.25 |
| 15 | 0.2581989 | 0.25 | 0.25 |
| 16 | 0.25 | 0.25 | 0.25 |
| 17 | 0.24253562 | 0.21875 | 0.234375 |
| 18 | 0.23570226 | 0.21875 | 0.234375 |
| 19 | 0.22941573 | 0.21875 | 0.21875 |
| 20 | 0.2236068 | 0.21875 | 0.21875 |

Something I really like about it is that for all length values between 1 and 9 
included, you get different values for the rounded norms. I have seen several 
users asking why "A B C D" would score as well as "A B C" when the query is eg. 
"A" in spite of being longer, and if we could get this addressed for short 
fields (think eg. product names), I think that would be a great win.

> Deprecate index-time boosts?
> 
>
> Key: LUCENE-6819
> URL: https://issues.apache.org/jira/browse/LUCENE-6819
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>
> Follow-up of this comment: 
> https://issues.apache.org/jira/browse/LUCENE-6818?focusedCommentId=14934801=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14934801
> Index-time boosts are a very expert feature whose behaviour is tight to the 
> Similarity impl. Additionally users have often be confused by the poor 
> precision due to the fact that we encode values on a single byte. But now we 
> have doc values that allow you to encode any values the way you want with as 
> much precision as you need so maybe we should deprecate index-time boosts and 
> recommend to encode index-time scoring factors into doc values fields instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6819) Deprecate index-time boosts?

2017-02-16 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870286#comment-15870286
 ] 

David Smiley commented on LUCENE-6819:
--

Index time boosts are valuable.  I've found that tuning index time boosts is 
the easiest way to boost documents that has the most predictable effect, 
relative to query time boosts.  Of course it's inflexible but it's a trade-off.

> Deprecate index-time boosts?
> 
>
> Key: LUCENE-6819
> URL: https://issues.apache.org/jira/browse/LUCENE-6819
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>
> Follow-up of this comment: 
> https://issues.apache.org/jira/browse/LUCENE-6818?focusedCommentId=14934801=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14934801
> Index-time boosts are a very expert feature whose behaviour is tight to the 
> Similarity impl. Additionally users have often be confused by the poor 
> precision due to the fact that we encode values on a single byte. But now we 
> have doc values that allow you to encode any values the way you want with as 
> much precision as you need so maybe we should deprecate index-time boosts and 
> recommend to encode index-time scoring factors into doc values fields instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6819) Deprecate index-time boosts?

2017-02-15 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867951#comment-15867951
 ] 

Adrien Grand commented on LUCENE-6819:
--

An interesting side-effect of such a change is that we could make the length 
normalization factors more accurate. If we remove index-time boosts, then 
length normalization factors would always be less than or equal to 1 so we 
could take one bit from the exponent and use it for the mantissa instead.

> Deprecate index-time boosts?
> 
>
> Key: LUCENE-6819
> URL: https://issues.apache.org/jira/browse/LUCENE-6819
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>
> Follow-up of this comment: 
> https://issues.apache.org/jira/browse/LUCENE-6818?focusedCommentId=14934801=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14934801
> Index-time boosts are a very expert feature whose behaviour is tight to the 
> Similarity impl. Additionally users have often be confused by the poor 
> precision due to the fact that we encode values on a single byte. But now we 
> have doc values that allow you to encode any values the way you want with as 
> much precision as you need so maybe we should deprecate index-time boosts and 
> recommend to encode index-time scoring factors into doc values fields instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org