Re: edismax parser ignores mm parameter when tokenizer splits tokens (hypenated words, WDF splitting etc)
Opened a JIRA issue: https://issues.apache.org/jira/browse/SOLR-3589, which also lists a couple other related mailing list posts. On Thu, Jun 28, 2012 at 12:18 PM, Tom Burton-West tburt...@umich.eduwrote: Hello, My previous e-mail with a CJK example has received no replies. I verified that this problem also occurs for English. For example in the case of the word fire-fly , The ICUTokenizer and the WordDelimeterFilter both split this into two tokens fire and fly. With an edismax query and a must match of 2 : q={!edsmax mm=2} if the words are entered separately at [fire fly], the edismax parser honors the mm parameter and does the equivalent of a Boolean AND query. However if the words are entered as a hypenated word [fire-fly], the tokenizer splits these into two tokens fire and fly and the edismax parser does the equivalent of a Boolean OR query. I'm not sure I understand the output of the debugQuery, but judging by the number of hits returned it appears that edismax is not honoring the mm parameter. Am I missing something, or is this a bug? I'd like to file a JIRA issue, but want to find out if I am missing something here. Details of several queries are appended below. Tom Burton-West edismax query mm=2 query with hypenated word [fire-fly] lst name=debug str name=rawquerystring{!edismax mm=2}fire-fly/str str name=querystring{!edismax mm=2}fire-fly/str str name=parsedquery+DisjunctionMaxQuery(((ocr:fire ocr:fly)))/str str name=parsedquery_toString+((ocr:fire ocr:fly))/str Entered as separate words [fire fly] numFound=184962 edismax mm=2 lst name=debug str name=rawquerystring{!edismax mm=2}fire fly/str str name=querystring{!edismax mm=2}fire fly/str str name=parsedquery +((DisjunctionMaxQuery((ocr:fire)) DisjunctionMaxQuery((ocr:fly)))~2) /str Regular Boolean AND query: [fire AND fly] numFound=184962 str name=rawquerystringfire AND fly/str str name=querystringfire AND fly/str str name=parsedquery+ocr:fire +ocr:fly/str str name=parsedquery_toString+ocr:fire +ocr:fly/str Regular Boolean OR query: fire OR fly 366047 numFound=366047 lst name=debug str name=rawquerystringfire OR fly/str str name=querystringfire OR fly/str str name=parsedqueryocr:fire ocr:fly/str str name=parsedquery_toStringocr:fire ocr:fly/str
edismax parser ignores mm parameter when tokenizer splits tokens (hypenated words, WDF splitting etc)
Hello, My previous e-mail with a CJK example has received no replies. I verified that this problem also occurs for English. For example in the case of the word fire-fly , The ICUTokenizer and the WordDelimeterFilter both split this into two tokens fire and fly. With an edismax query and a must match of 2 : q={!edsmax mm=2} if the words are entered separately at [fire fly], the edismax parser honors the mm parameter and does the equivalent of a Boolean AND query. However if the words are entered as a hypenated word [fire-fly], the tokenizer splits these into two tokens fire and fly and the edismax parser does the equivalent of a Boolean OR query. I'm not sure I understand the output of the debugQuery, but judging by the number of hits returned it appears that edismax is not honoring the mm parameter. Am I missing something, or is this a bug? I'd like to file a JIRA issue, but want to find out if I am missing something here. Details of several queries are appended below. Tom Burton-West edismax query mm=2 query with hypenated word [fire-fly] lst name=debug str name=rawquerystring{!edismax mm=2}fire-fly/str str name=querystring{!edismax mm=2}fire-fly/str str name=parsedquery+DisjunctionMaxQuery(((ocr:fire ocr:fly)))/str str name=parsedquery_toString+((ocr:fire ocr:fly))/str Entered as separate words [fire fly] numFound=184962 edismax mm=2 lst name=debug str name=rawquerystring{!edismax mm=2}fire fly/str str name=querystring{!edismax mm=2}fire fly/str str name=parsedquery +((DisjunctionMaxQuery((ocr:fire)) DisjunctionMaxQuery((ocr:fly)))~2) /str Regular Boolean AND query: [fire AND fly] numFound=184962 str name=rawquerystringfire AND fly/str str name=querystringfire AND fly/str str name=parsedquery+ocr:fire +ocr:fly/str str name=parsedquery_toString+ocr:fire +ocr:fly/str Regular Boolean OR query: fire OR fly 366047 numFound=366047 lst name=debug str name=rawquerystringfire OR fly/str str name=querystringfire OR fly/str str name=parsedqueryocr:fire ocr:fly/str str name=parsedquery_toStringocr:fire ocr:fly/str