Re: Is semicolon a character that needs escaping?

2010-09-08 Thread Michael Lackhoff
On 08.09.2010 00:05 Chris Hostetter wrote:

 
 : Subject: Is semicolon a character that needs escaping?
   ...
 : From this I conclude that there is a bug either in the docs or in the
 : query parser or I missed something. What is wrong here?
 
 Back in Solr 1.1, the standard query parser treated ; as a special 
 character and looked for sort instructions after it.  
 
 Starting in Solr 1.2 (released in 2007) a sort param was added, and 
 semicolon was only considered a special character if you did not 
 explicilty mention a sort param (for back compatibility)
 
 Starting with Solr 1.4, the default was changed so that semicolon wasn't 
 considered a meta-character even if you didn't have a sort param -- you 
 have to explicilty select the lucenePlusSort QParser to get this 
 behavior.
 
 I can only assume that if you are seeing this behavior, you are either 
 using a very old version of Solr, or you have explicitly selected the 
 lucenePlusSort parser somewhere in your params/config.
 
 This was heavily documented in CHANGES.txt for Solr 1.4 (you can find 
 mention of it when searching for either ; or semicolon)

I am using 1.3 without a sort param which explains it, I think. It would
be nice to update to 1.4 but we try to avoid such actions on a
production server as long as everything runs fine (the semicolon thing
was only reported recently).

Many thanks for your detailed explanation!
-Michael


Re: Is semicolon a character that needs escaping?

2010-09-08 Thread Chris Hostetter

: I am using 1.3 without a sort param which explains it, I think. It would
: be nice to update to 1.4 but we try to avoid such actions on a
: production server as long as everything runs fine (the semicolon thing
: was only reported recently).

if you don't currenlty use sort at all, then adding a default sort param 
of score desc to your solr config for that handler, you shouldn't have 
to ever worry about semicolons again.

(i'm fairly certainSolr 1.3 supported Defaults - i may be wrong ... you 
might have to add that hardcoded sort param in your client)


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



Re: Is semicolon a character that needs escaping?

2010-09-07 Thread Chris Hostetter


: Subject: Is semicolon a character that needs escaping?
...
: From this I conclude that there is a bug either in the docs or in the
: query parser or I missed something. What is wrong here?

Back in Solr 1.1, the standard query parser treated ; as a special 
character and looked for sort instructions after it.  

Starting in Solr 1.2 (released in 2007) a sort param was added, and 
semicolon was only considered a special character if you did not 
explicilty mention a sort param (for back compatibility)

Starting with Solr 1.4, the default was changed so that semicolon wasn't 
considered a meta-character even if you didn't have a sort param -- you 
have to explicilty select the lucenePlusSort QParser to get this 
behavior.

I can only assume that if you are seeing this behavior, you are either 
using a very old version of Solr, or you have explicitly selected the 
lucenePlusSort parser somewhere in your params/config.

This was heavily documented in CHANGES.txt for Solr 1.4 (you can find 
mention of it when searching for either ; or semicolon)



-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



Is semicolon a character that needs escaping?

2010-09-02 Thread Michael Lackhoff
According to http://lucene.apache.org/java/2_9_1/queryparsersyntax.html
only these characters need escaping:
+ -  || ! ( ) { } [ ] ^  ~ * ? : \
but with this simple query:
TI:stroke; AND TI:journal
I got the error message:
HTTP ERROR: 400
Unknown sort order: TI:journal

My first guess was that it was a URL encoding issue but everything looks
fine:
http://localhost:8983/solr/select/?q=TI%3Astroke%3B+AND+TI%3Ajournalversion=2.2start=0rows=10indent=on
as you can see, the semicolon is encoded as %3B
There is no problem when the query ends with the semicolon:
TI:stroke;
gives no error.
The first query also works if I escape the semicolon:
TI:stroke\; AND TI:journal

From this I conclude that there is a bug either in the docs or in the
query parser or I missed something. What is wrong here?

-Michael


Re: Is semicolon a character that needs escaping?

2010-09-02 Thread Michael Lackhoff
On 03.09.2010 00:57 Ken Krugler wrote:

 The docs need to be updated, I believe. From some code I wrote back in  
 2006...
 [...]

Thanks this explains it very well.

 But in general escaping characters in a query gets tricky - if you can  
 directly build queries versus pre-processing text sent to the query  
 parser, you'll save yourself some pain and suffering.

What do you mean by these two alternatives? That is, what exactly could
I do better?

 Also, since I did the above code the DisMaxRequestHandler has been  
 added to Solr, and it (IIRC) tries to be smart about handling this  
 type of escaping for you.

Dismax is not (yet) an option because we need the full lucene syntax
within the query. Perhaps this will change with the new enhanced dismax
request handler but I didn't play with it enough (will do with the next
release).

-Michael


Re: Is semicolon a character that needs escaping?

2010-09-02 Thread Ken Krugler

Hi Michael,

But in general escaping characters in a query gets tricky - if you  
can

directly build queries versus pre-processing text sent to the query
parser, you'll save yourself some pain and suffering.


What do you mean by these two alternatives? That is, what exactly  
could

I do better?


By can build..., I meant if you can come up with a GUI whereby the  
user doesn't have to use special characters (other than say quoting)  
then you can take a collection of clauses and programmatically build  
your query, without using the query parser.


The code I wound up having to write for what seemed like simple  
escaping quickly got complex and convoluted - e.g. if you want to  
allow AND as a term, and don't want it to get processed specially by  
the query parser.



Also, since I did the above code the DisMaxRequestHandler has been
added to Solr, and it (IIRC) tries to be smart about handling this
type of escaping for you.


Dismax is not (yet) an option because we need the full lucene syntax
within the query.


OK - in that case sounds like you're stuck with escaping.


-- Ken

--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g







Re: Is semicolon a character that needs escaping?

2010-09-02 Thread Michael Lackhoff
Hi Ken,

 But in general escaping characters in a query gets tricky - if you  
 can
 directly build queries versus pre-processing text sent to the query
 parser, you'll save yourself some pain and suffering.

 What do you mean by these two alternatives? That is, what exactly  
 could
 I do better?
 
 By can build..., I meant if you can come up with a GUI whereby the  
 user doesn't have to use special characters (other than say quoting)  
 then you can take a collection of clauses and programmatically build  
 your query, without using the query parser.

I think I have that (escaping of characters that have a special meaning
in Solr). I just didn't know that the semicolon is one of them. So it
would be nice if the docs could be updated to account for this.

Thanks again
-Michael