Well, sometimes people just copy-paste stuff into the search box probably because some words (at least in my world) are very hard to spell correctly. We noticed the problem because the query was getting mangled on its way in and not returning any search results even though it should have.
Our analysis chain (both query and index) uses ASCIIFoldingFilter to downcast these special characters to "equivalent" ASCII, so a string such as "Ångström" for example will actually result in a search for "angstrom". The indexing also does the same conversion. The mangling looked very similar to what happens when UTF-8 is passed through ISO-8859-1 encoding (and vice versa) which led us to the solution. -sujit On Feb 1, 2012, at 5:04 PM, Erick Erickson wrote: > Sujit's comments are well taken, part of your problem will certainly be > getting the special characters through your container... > > But another part of your problem will be having the characters in > your index in the first place. The fact that you can find "Time" in > the first place suggests that your index does NOT have the special > characters, you need to look to your analysis chain to see > what transformations occur, see the admin/analysis page... > > But I question why you need to search on special characters. Do > you really expect the user to be happy with being required to > enter "Company®"? A common approach is to remove such > special characters during both index and query analyzing so a > "Company®" and "Company" are equivalent. > > But your problem space may differ. > > Best > Erick > > On Wed, Feb 1, 2012 at 6:55 PM, SUJIT PAL <sujit....@comcast.net> wrote: >> Hi Tejinder, >> >> I had this problem yesterday (believe it or not :-)), and the fix for us was >> to make Tomcat UTF-8 compliant. In server.xml, there is a <Controller> tag, >> we added the attribute URIEncoding="UTF-8" and restarted Tomcat. Not sure >> what container you are using, if its Tomcat this will solve it, else you >> could probably find a similar setting for your container. Here is a link >> that provides more specific info: >> http://struts.apache.org/2.0.6/docs/how-to-support-utf-8-uriencoding-with-tomcat.html >> >> -sujit >> >> On Feb 1, 2012, at 11:52 AM, Tejinder Rawat wrote: >> >>> Hi all, >>> >>> In my implementation many fields in documents are having words with >>> special characters like "Company®" ,"Time™". >>> >>> Index is created using these fields. However if I make search using >>> these keywords in solr console, it does not work. >>> >>> i.e. entering "Company®" or "Time™" in search field box does not >>> return any document. Where as entering "Company" or "Time" returns >>> documents. >>> >>> Requirement is to be able to make search with special characters in >>> keywords. >>> >>> Any pointers about how to index and search in case of special >>> characters will be greatly appreciated. Thank you. >>> >>> >>> Thanks, >>> Tejinder >>