[ 
https://issues.apache.org/jira/browse/LUCENE-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3430:
--------------------------------

    Attachment: LUCENE-3430.patch

patch, my modifications to the others take the same approach as lucene's sim

I did the relevance testing (across all 129 possibilities) with short queries, 
no problems, still waiting on my computer for long queries... if that comes 
back ok I'd like to commit.


> TestParser.testSpanTermXML fails with some sims
> -----------------------------------------------
>
>                 Key: LUCENE-3430
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3430
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 4.0
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-3430.patch
>
>
> here is why this test sometimes fails (my explanation in the test i wrote):
> {noformat}
>   /** make sure all sims work with spanOR(termX, termY) where termY does not 
> exist */
>   public void testCrazySpans() throws Exception {
>     // The problem: "normal" lucene queries create scorers, returning null if 
> terms dont exist
>     // This means they never score a term that does not exist.
>     // however with spans, there is only one scorer for the whole hierarchy:
>     // inner queries are not real queries, their boosts are ignored, etc.
> {noformat}
> Basically, SpanQueries aren't really queries, you just get one scorer. it 
> calls extractTerms on the whole hierarchy and computes weights (e.g. IDF) on
> the whole bag of terms, even if they don't exist.
> This is fine, we already have tests that sim's won't bug-out in 
> computeStats() here: however they don't expect to actually score documents 
> based on
> these terms that don't exist... however this is exactly what happens in Spans 
> because it doesn't use sub-scorers.
> Lucene's sim avoids this with the (docFreq + 1)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to