Why special character is handled differently by standard/lucene query parser?

2011-05-06 Thread cyang2010
Hi, 

When user entered text contains special character, can this being taken care
by the tokenizer/filter configured at the field?

In application code, Do i need to parse the user input string and add the
escape in front of those special character?  If so, will those special
characters differ for different language, such as english versus chinese? 

As of now, I didn't parse those special character.  i am getting this
inconsistent/strange behavior/error.  For example:

1. search: title_name_en_US:(my! god)
solr thinks the second term god is something NOT to include, why is that?
lst name=debug
str name=rawquerystringtitle_name_en_US:(my! god)/str
str name=querystringtitle_name_en_US:(my! god)/str
str name=parsedquerytitle_name_en_US:my -title_name_en_US:god/str
str name=parsedquery_toStringtitle_name_en_US:my
-title_name_en_US:god/str

2. search: title_name_en_US:my!
solr return error instead, even worse:  --

INFO: [titles] webapp=/solr path=/select
params={explainOther=fl=*,scoredebugQ
uery=onindent=onstart=0q=title_name_en_US:(Oh!)hl.fl=qt=standardwt=standar
dfq=rows=10version=2.2} status=400 QTime=0
May 7, 2011 2:13:48 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException:
org.apache.lucene.queryParser.Pars
eException: Cannot parse 'title_name_en_US:Oh!': Encountered EOF at line
1,
column 20.
Was expecting one of:
( ...
* ...
QUOTED ...
TERM ...
PREFIXTERM ...
WILDTERM ...
[ ...
{ ...
NUMBER ...
TERM ...
* ...

at
org.apache.solr.handler.component.QueryComponent.prepare(QueryCompone
nt.java:108)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sea
rchHandler.java:181)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
erBase.java:131)

Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse
'title_nam
e_en_US:my!': Encountered EOF at line 1, column 20.
Was expecting one of:
( ...
* ...
QUOTED ...
TERM ...
PREFIXTERM ...
WILDTERM ...
[ ...
{ ...
NUMBER ...
TERM ...
* ...

at
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:205)


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-special-character-is-handled-differently-by-standard-lucene-query-parser-tp2910692p2910692.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Why special character is handled differently by standard/lucene query parser?

2011-05-06 Thread Yonik Seeley
On Fri, May 6, 2011 at 10:35 PM, cyang2010 ysxsu...@hotmail.com wrote:
 When user entered text contains special character, can this being taken care
 by the tokenizer/filter configured at the field?

 In application code, Do i need to parse the user input string and add the
 escape in front of those special character?  If so, will those special
 characters differ for different language, such as english versus chinese?

 As of now, I didn't parse those special character.  i am getting this
 inconsistent/strange behavior/error.  For example:

 1. search: title_name_en_US:(my! god)
 solr thinks the second term god is something NOT to include, why is that?

! is a synonym for the NOT operator in lucene query parser syntax.
The fact that it's treated as an operator even when followed by
whitespace is a bug.
This was fixed by LUCENE-2566 (which is in the trunk version, but not 3.1)

One workaround is to escape the ! or quote the term.
title_name_en_US:(my\! god)
title_name_en_US:(my! god)

In general, the lucene query parser isn't meant for directly handling
literal user queries since it has a more strict syntax (like SQL).
Something like the dismax or edismax may help (try adding
defType=dismax to your request).  They are designed to try and never
throw exceptions.


-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


Re: Why special character is handled differently by standard/lucene query parser?

2011-05-06 Thread cyang2010
I know about dismax.  But with that, i can't perform prefix and fuzzy query.  
can edismax handle prefix and fuzzy query?  

My application logic just pass the user entered text to solr server to
perform term query, phrase query, prefix and fuzzy query.   And i don't want
to escape the special character by parsing the java string, since i might
deal with things in different language set.   That is why I also ask whether
those special character is lanaguage specific or agnostic.

Look for your answers.


cy

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-special-character-is-handled-differently-by-standard-lucene-query-parser-tp2910692p2910809.html
Sent from the Solr - User mailing list archive at Nabble.com.