Hi, Let's say you have accomplished what you want. You have a .txt with the tokens tomerge, like "European" and "Parliament". What is your use case then? What is your high level goal?
MappingCharFilter approach is closer (to your .txt approach) than PatternReplaceCharFilterFactory approach. By the way, it could also be simulated with ShingleFilterFactory + KeepWordFilterFactory + TypeTokenFilterFactory May be it can be done via firing phrase queries at query time (without interfering with the index) at client side? e.g. q="European Parliament"~0 On Friday, February 28, 2014 11:55 AM, epnRui <rui_banda...@hotmail.com> wrote: Hi Ahmet!! I went ahead and did something I thought it was not a clean solution and then when I read your post and I found we thought of the same solution, including the European_Parliament with the _ :) So I guess there would be no way to do this more cleanly, maybe only implementing my own Tokenizer and Filters, but I honestly couldn't find a tutorial for implement a customized solr Tokenizer. If I end up needing to do it I will write a tutorial. So for now I'm doing PatternReplaceCharFilterFactory to replace "European Parliament" with <MD5Hash>European_Parliament (initially I didnt use the md5hash European_Parliament). Then I replace it back after the StandardTokenizerFactory ran, into "European Parliament". Well I guess I just found a way to do a 2 words token :) I had seen the ShingleFilterFactory but the problem is I don't need the whole phrase in tokens of 2 words and I understood it's what it does. Of course I would need some filter that would handle a .txt with the tokens to merge, like "European" and "Parliament". I'm still having some other problem now but maybe I find a solution after I read the page you annexed which seems great. Solr is considering #European as #European and European, meaning it does 2 facets for one token. I want it to consider it only as #European. I ran the analyzer debugger in my Solr admin console and I don't see how he can be doing that. Would you know of a reason for this? Thanks for your reply and that page you annexed seems excelent and I'll read it through. -- View this message in context: http://lucene.472066.n3.nabble.com/Facets-termvectors-relevancy-and-Multi-word-tokenizing-tp4120101p4120361.html Sent from the Solr - User mailing list archive at Nabble.com.