On 22-Apr-08, at 6:00 PM, Christian Wittern wrote:
Mike Klaas wrote:
On 19-Apr-08, at 3:02 AM, Christian Wittern wrote:
So it could be that the match is not part of the fragment? This sounds a bit strange. Is there a way to make sure the fragment contains the match other than returning the whole field and do the fragmenting myself?

[...]
As you can see, only fragments containing a match are returned (note that there is very often multiple matches--you seemed to assume only one).

Mike, thank you for the clarification. Now I understand what went wrong in the example I looked at. I am querying ngram indexed data (Chinese text). A user enters two or three characters and expect them to be matched more or less as a substring match. The fragment I looked at did contain only one of the characters (the other was cut off at the end), this is what made me wondering. From what you say, even adding quotation marks around the query will not prevent this from happening (in this case, it would simply obscure the match). Are there any plans to improve the algorithm for fragmentation? Or are there other work arounds?

LUCENE-794 contains an implementation that solves this problem. My plan is to eventually integrate this into Solr one day, but I don't see myself having time for this in the short or medium term.

Contributions welcome :)

-Mike

Reply via email to