romseygeek opened a new pull request #1618:
URL: https://github.com/apache/lucene-solr/pull/1618


   Given the input text 'A B A C', an ordered interval 'A B C' will currently 
return an incorrect
   internal [2, 3] in addition to the correct [0, 3] interval.  This is due to 
a bug in the ORDERED
   algorithm, where we assume that after the first interval is returned, the 
sub-intervals are 
   always in-order.  This assumption only holds during minimization, as 
minimizing an interval
   may move the earlier terms beyond the trailing terms.
   
   For example, after the initial [0, 3] interval is found above, the algorithm 
will attempt to
   minimize it by advancing A to [2,2].  Because this is still before C at 
[3,3], but after B at 
   [1,1], we then try advancing B, leaving it at [Inf,Inf].  Minimization has 
failed, so we return
   out original interval of [0,3].  However, when we come to retrieve the next 
interval, our
   subintervals look like this: A[2,2], B[Inf,Inf], C[3,3] - the assumption 
that they are in order
   is broken.  The algorithm sees that A is before B, assumes that therefore 
all subsequent
   subintervals are in order, and returns the new interval.
   
   This commit fixes things by changing the assumption of ordering to only hold 
during 
   minimization.  When first finding a candidate interval, the algorithm now 
checks that
   all sub-intervals appear in order.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to