Why special character is handled differently by standard/lucene query parser?
Hi, When user entered text contains special character, can this being taken care by the tokenizer/filter configured at the field? In application code, Do i need to parse the user input string and add the escape in front of those special character? If so, will those special characters differ for different language, such as english versus chinese? As of now, I didn't parse those special character. i am getting this inconsistent/strange behavior/error. For example: 1. search: title_name_en_US:(my! god) solr thinks the second term god is something NOT to include, why is that? lst name=debug str name=rawquerystringtitle_name_en_US:(my! god)/str str name=querystringtitle_name_en_US:(my! god)/str str name=parsedquerytitle_name_en_US:my -title_name_en_US:god/str str name=parsedquery_toStringtitle_name_en_US:my -title_name_en_US:god/str 2. search: title_name_en_US:my! solr return error instead, even worse: -- INFO: [titles] webapp=/solr path=/select params={explainOther=fl=*,scoredebugQ uery=onindent=onstart=0q=title_name_en_US:(Oh!)hl.fl=qt=standardwt=standar dfq=rows=10version=2.2} status=400 QTime=0 May 7, 2011 2:13:48 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: org.apache.lucene.queryParser.Pars eException: Cannot parse 'title_name_en_US:Oh!': Encountered EOF at line 1, column 20. Was expecting one of: ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... TERM ... * ... at org.apache.solr.handler.component.QueryComponent.prepare(QueryCompone nt.java:108) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sea rchHandler.java:181) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl erBase.java:131) Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 'title_nam e_en_US:my!': Encountered EOF at line 1, column 20. Was expecting one of: ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... TERM ... * ... at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:205) -- View this message in context: http://lucene.472066.n3.nabble.com/Why-special-character-is-handled-differently-by-standard-lucene-query-parser-tp2910692p2910692.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Why special character is handled differently by standard/lucene query parser?
On Fri, May 6, 2011 at 10:35 PM, cyang2010 ysxsu...@hotmail.com wrote: When user entered text contains special character, can this being taken care by the tokenizer/filter configured at the field? In application code, Do i need to parse the user input string and add the escape in front of those special character? If so, will those special characters differ for different language, such as english versus chinese? As of now, I didn't parse those special character. i am getting this inconsistent/strange behavior/error. For example: 1. search: title_name_en_US:(my! god) solr thinks the second term god is something NOT to include, why is that? ! is a synonym for the NOT operator in lucene query parser syntax. The fact that it's treated as an operator even when followed by whitespace is a bug. This was fixed by LUCENE-2566 (which is in the trunk version, but not 3.1) One workaround is to escape the ! or quote the term. title_name_en_US:(my\! god) title_name_en_US:(my! god) In general, the lucene query parser isn't meant for directly handling literal user queries since it has a more strict syntax (like SQL). Something like the dismax or edismax may help (try adding defType=dismax to your request). They are designed to try and never throw exceptions. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: Why special character is handled differently by standard/lucene query parser?
I know about dismax. But with that, i can't perform prefix and fuzzy query. can edismax handle prefix and fuzzy query? My application logic just pass the user entered text to solr server to perform term query, phrase query, prefix and fuzzy query. And i don't want to escape the special character by parsing the java string, since i might deal with things in different language set. That is why I also ask whether those special character is lanaguage specific or agnostic. Look for your answers. cy -- View this message in context: http://lucene.472066.n3.nabble.com/Why-special-character-is-handled-differently-by-standard-lucene-query-parser-tp2910692p2910809.html Sent from the Solr - User mailing list archive at Nabble.com.