On 22-Apr-08, at 6:00 PM, Christian Wittern wrote:
Mike Klaas wrote:
On 19-Apr-08, at 3:02 AM, Christian Wittern wrote:
So it could be that the match is not part of the fragment? This
sounds a bit strange. Is there a way to make sure the fragment
contains the match other than returning the whole field and do the
fragmenting myself?
[...]
As you can see, only fragments containing a match are returned
(note that there is very often multiple matches--you seemed to
assume only one).
Mike, thank you for the clarification. Now I understand what went
wrong in the example I looked at. I am querying ngram indexed
data (Chinese text). A user enters two or three characters and
expect them to be matched more or less as a substring match. The
fragment I looked at did contain only one of the characters (the
other was cut off at the end), this is what made me wondering.
From what you say, even adding quotation marks around the query will
not prevent this from happening (in this case, it would simply
obscure the match).
Are there any plans to improve the algorithm for fragmentation? Or
are there other work arounds?
LUCENE-794 contains an implementation that solves this problem. My
plan is to eventually integrate this into Solr one day, but I don't
see myself having time for this in the short or medium term.
Contributions welcome :)
-Mike