subject:"Searching for escaped characters"

Searching for escaped characters

2011-04-28 Thread Paul

I'm trying to create a test to make sure that character sequences like egrave; are successfully converted to their equivalent utf character (that is, in this case, è). So, I'd like to search my solr index using the equivalent of the following regular expression: \w{1,6}; To find any escaped

Re: Searching for escaped characters

2011-04-28 Thread Mike Sokolov

StandardTokenizer will have stripped punctuation I think. You might try searching for all the entity names though: (agrave | egrave | omacron | etc... ) The names are pretty distinctive. Although you might have problems with greek letters. -Mike On 04/28/2011 12:10 PM, Paul wrote: I'm