Re: Is semicolon a character that needs escaping?
On 08.09.2010 00:05 Chris Hostetter wrote: : Subject: Is semicolon a character that needs escaping? ... : From this I conclude that there is a bug either in the docs or in the : query parser or I missed something. What is wrong here? Back in Solr 1.1, the standard query parser treated ; as a special character and looked for sort instructions after it. Starting in Solr 1.2 (released in 2007) a sort param was added, and semicolon was only considered a special character if you did not explicilty mention a sort param (for back compatibility) Starting with Solr 1.4, the default was changed so that semicolon wasn't considered a meta-character even if you didn't have a sort param -- you have to explicilty select the lucenePlusSort QParser to get this behavior. I can only assume that if you are seeing this behavior, you are either using a very old version of Solr, or you have explicitly selected the lucenePlusSort parser somewhere in your params/config. This was heavily documented in CHANGES.txt for Solr 1.4 (you can find mention of it when searching for either ; or semicolon) I am using 1.3 without a sort param which explains it, I think. It would be nice to update to 1.4 but we try to avoid such actions on a production server as long as everything runs fine (the semicolon thing was only reported recently). Many thanks for your detailed explanation! -Michael
Re: Is semicolon a character that needs escaping?
: I am using 1.3 without a sort param which explains it, I think. It would : be nice to update to 1.4 but we try to avoid such actions on a : production server as long as everything runs fine (the semicolon thing : was only reported recently). if you don't currenlty use sort at all, then adding a default sort param of score desc to your solr config for that handler, you shouldn't have to ever worry about semicolons again. (i'm fairly certainSolr 1.3 supported Defaults - i may be wrong ... you might have to add that hardcoded sort param in your client) -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Re: Is semicolon a character that needs escaping?
: Subject: Is semicolon a character that needs escaping? ... : From this I conclude that there is a bug either in the docs or in the : query parser or I missed something. What is wrong here? Back in Solr 1.1, the standard query parser treated ; as a special character and looked for sort instructions after it. Starting in Solr 1.2 (released in 2007) a sort param was added, and semicolon was only considered a special character if you did not explicilty mention a sort param (for back compatibility) Starting with Solr 1.4, the default was changed so that semicolon wasn't considered a meta-character even if you didn't have a sort param -- you have to explicilty select the lucenePlusSort QParser to get this behavior. I can only assume that if you are seeing this behavior, you are either using a very old version of Solr, or you have explicitly selected the lucenePlusSort parser somewhere in your params/config. This was heavily documented in CHANGES.txt for Solr 1.4 (you can find mention of it when searching for either ; or semicolon) -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Is semicolon a character that needs escaping?
According to http://lucene.apache.org/java/2_9_1/queryparsersyntax.html only these characters need escaping: + - || ! ( ) { } [ ] ^ ~ * ? : \ but with this simple query: TI:stroke; AND TI:journal I got the error message: HTTP ERROR: 400 Unknown sort order: TI:journal My first guess was that it was a URL encoding issue but everything looks fine: http://localhost:8983/solr/select/?q=TI%3Astroke%3B+AND+TI%3Ajournalversion=2.2start=0rows=10indent=on as you can see, the semicolon is encoded as %3B There is no problem when the query ends with the semicolon: TI:stroke; gives no error. The first query also works if I escape the semicolon: TI:stroke\; AND TI:journal From this I conclude that there is a bug either in the docs or in the query parser or I missed something. What is wrong here? -Michael
Re: Is semicolon a character that needs escaping?
On 03.09.2010 00:57 Ken Krugler wrote: The docs need to be updated, I believe. From some code I wrote back in 2006... [...] Thanks this explains it very well. But in general escaping characters in a query gets tricky - if you can directly build queries versus pre-processing text sent to the query parser, you'll save yourself some pain and suffering. What do you mean by these two alternatives? That is, what exactly could I do better? Also, since I did the above code the DisMaxRequestHandler has been added to Solr, and it (IIRC) tries to be smart about handling this type of escaping for you. Dismax is not (yet) an option because we need the full lucene syntax within the query. Perhaps this will change with the new enhanced dismax request handler but I didn't play with it enough (will do with the next release). -Michael
Re: Is semicolon a character that needs escaping?
Hi Michael, But in general escaping characters in a query gets tricky - if you can directly build queries versus pre-processing text sent to the query parser, you'll save yourself some pain and suffering. What do you mean by these two alternatives? That is, what exactly could I do better? By can build..., I meant if you can come up with a GUI whereby the user doesn't have to use special characters (other than say quoting) then you can take a collection of clauses and programmatically build your query, without using the query parser. The code I wound up having to write for what seemed like simple escaping quickly got complex and convoluted - e.g. if you want to allow AND as a term, and don't want it to get processed specially by the query parser. Also, since I did the above code the DisMaxRequestHandler has been added to Solr, and it (IIRC) tries to be smart about handling this type of escaping for you. Dismax is not (yet) an option because we need the full lucene syntax within the query. OK - in that case sounds like you're stuck with escaping. -- Ken -- Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g
Re: Is semicolon a character that needs escaping?
Hi Ken, But in general escaping characters in a query gets tricky - if you can directly build queries versus pre-processing text sent to the query parser, you'll save yourself some pain and suffering. What do you mean by these two alternatives? That is, what exactly could I do better? By can build..., I meant if you can come up with a GUI whereby the user doesn't have to use special characters (other than say quoting) then you can take a collection of clauses and programmatically build your query, without using the query parser. I think I have that (escaping of characters that have a special meaning in Solr). I just didn't know that the semicolon is one of them. So it would be nice if the docs could be updated to account for this. Thanks again -Michael