Re: Strip special chars like -
Erick, you're right. It's working, my schema looks like this: fieldType name=name_type class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 catenateWords=1 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=0/ /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 catenateWords=0 catenateNumbers=0 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=0 / /analyzer /fieldType Thanks for helping me!! -- View this message in context: http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3248545.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strip special chars like -
Yes, i understand the difference between generateWordParts and catenateWords. But i can't fix my problem with these options, It doesn't fix all the possibilities. -- View this message in context: http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239186.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strip special chars like -
OK, what are the other possibilities that it doesn't fix? Just saying it won't work without some examples doesn't leave much to go on... Best Erick On Tue, Aug 9, 2011 at 10:41 AM, roySolr royrutten1...@gmail.com wrote: Yes, i understand the difference between generateWordParts and catenateWords. But i can't fix my problem with these options, It doesn't fix all the possibilities. -- View this message in context: http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239186.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strip special chars like -
Ok, i there are three query possibilities: Manchester-united Manchester united Manchesterunited The original name of the club is manchester-united. generateWordParts will fixes two of these possibilities: Manchester-united = manchester,united I can search for Manchester-united and manchester united. When i search for manchesterunited i get no results. To fix this i could use catenateWords: Manchester-united = manchesterunited In this situation i can search for Manchester-united and manchesterunited. When i search for manchester united i get no results. The catenateWords option will also fixes only 2 situations. -- View this message in context: http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239256.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strip special chars like -
Hi I might be wrong as I've not tried it out to be sure but from the wiki docs: These parameters may be combined in any way. Example of generateWordParts=1 and catenateWords=1: PowerShot - 0:Power, 1:Shot 1:PowerShot (where 0,1,1 are token positions) does that fit the bill ? On 9 August 2011 16:03, roySolr royrutten1...@gmail.com wrote: Ok, i there are three query possibilities: Manchester-united Manchester united Manchesterunited The original name of the club is manchester-united. generateWordParts will fixes two of these possibilities: Manchester-united = manchester,united I can search for Manchester-united and manchester united. When i search for manchesterunited i get no results. To fix this i could use catenateWords: Manchester-united = manchesterunited In this situation i can search for Manchester-united and manchesterunited. When i search for manchester united i get no results. The catenateWords option will also fixes only 2 situations. -- View this message in context: http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239256.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strip special chars like -
I have done this using a custom tokenfilter that (among other things) detects hyphenated words and converts it to the 3 variations, using a regex match on the incoming token: (\w+)-(\w+) that runs the following regex transform: s/(\w+)-(\w+)/$1$2__$1 $2/ and then splits by __ and passes the original token, the one word and two word versions through a SynonymFilter further down the chain (see Lucene in Action, 2nd Edition for code). -sujit On Tue, 2011-08-09 at 06:27 -0700, roySolr wrote: Hello, I have some terms in my index with specials characters. An example is manchester-united. I want that a user can search for manchester-united,manchester united and manchesterunited. What's the best way to fix this? i have used the patternReplaceFilter and some tokenizers but it couldn't fix the last situation(manchesterunited). Can someone helps me? -- View this message in context: http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3238942.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strip special chars like -
That's not what I get. This is for Solr 3.3, but there's no reason that I know of that other versions should give different results. Here's the field def form the 3.3 example, this is just the standard implementation. fieldType name=text_en_splitting class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType At index time, it produces the tokens for manchester-united pos 1 pos 2 manchester united manchesterunited at query time, manchesterunited matches, it isn't transformed and matches on the second row manchester united and manchester-united both parse to manchester united and match the first row. So somehow we're not doing the same thing. Try attaching debugQuery=on to your query and post the results. Also try looking at the admin/analysis page and see what that tells you. Best Erick P.S. Did you re-index after your schema changes? On Tue, Aug 9, 2011 at 11:03 AM, roySolr royrutten1...@gmail.com wrote: Ok, i there are three query possibilities: Manchester-united Manchester united Manchesterunited The original name of the club is manchester-united. generateWordParts will fixes two of these possibilities: Manchester-united = manchester,united I can search for Manchester-united and manchester united. When i search for manchesterunited i get no results. To fix this i could use catenateWords: Manchester-united = manchesterunited In this situation i can search for Manchester-united and manchesterunited. When i search for manchester united i get no results. The catenateWords option will also fixes only 2 situations. -- View this message in context: http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3239256.html Sent from the Solr - User mailing list archive at Nabble.com.