Handling special characters in Lucene 4.0

2013-10-20 Thread saisantoshi
I have created strings like the below &&searchtext +sampletext and when I try to search the following using *&&** or *+** it does not give any result. I am using QueryParser.escape(String s) method to handle the special characters but does not look like it did anything. Also, when I search some

Re: Handling special characters in Lucene 4.0

2013-10-20 Thread Jack Krupansky
Maybe you are not using the same analyzer at index and query time. Even though you are correctly escaping the special query syntax characters, either the query analyzer is removing them or your index analyzer removed them. What analyzer are you using at index time? And, what analyzer are you us

Re: Handling special characters in Lucene 4.0

2013-10-20 Thread saisantoshi
StandardAnalyzer both at index and search time. We use the default one and don't have any custom analyzers. Thanks, Sai -- View this message in context: http://lucene.472066.n3.nabble.com/Handling-special-characters-in-Lucene-4-0-tp4096674p4096710.html Sent from the Lucene - Java Users mailing

Re: Handling special characters in Lucene 4.0

2013-10-20 Thread Jack Krupansky
The standard analyzer should remove those ampersands and pluses, so the core alpha terms should be matched. You would need to use the white space analyzer or a custom analyzer to preserve such special characters. Please give a specific indexed text string and a specific query that fails agains

Re: Handling special characters in Lucene 4.0

2013-10-20 Thread saisantoshi
Thanks. So, if I understand correctly, StandardAnalyzer wont work for the following below as it strips out the special characters and does search only on searchText ( in this case). queryText = *&&searchText* If we want to do a search like "*&&**" then we need to use WhiteSpaceAnalyzer. Please l

Re: Handling special characters in Lucene 4.0

2013-10-20 Thread Benson Margulies
It might be helpful if you would explain, at a higher level, what you are trying to accomplish. Where do these things come from? What higher-level problem are you trying to solve? On Sun, Oct 20, 2013 at 7:12 PM, saisantoshi wrote: > Thanks. > > So, if I understand correctly, StandardAnalyzer won

Re: Handling special characters in Lucene 4.0

2013-10-20 Thread Jack Krupansky
Right, the "Escaping Special Characters" is simply to escape query operators like "&&" (means "AND") and "+" (which means "AND" or "MUST"). Yes, the white space analyzer could be used, or a custom analyzer that uses the white space tokenizer and then also uses a filter to strip out any punctua

Re: Handling special characters in Lucene 4.0

2013-10-20 Thread saisantoshi
what about other characters like '&,'( quote) characters. We have a requirement that a text can start with 'sampletext' and when I search with a '* it does not return any results but instead when I search with sample*, it does return the result. Thanks, Ranjith, -- View this message in context:

Re: Handling special characters in Lucene 4.0

2013-10-20 Thread Jack Krupansky
Yes, other special (punctuation) characters will be preserved by the white space analyzer, but must be escaped in query strings. You will have to manually escape them with a backslash, since the QueryParser.escape method will escape asterisk as well, which would disable wildcard query. -- Jack