Re: Solr french search optimisation
Hello again, Is any one could help me, plase David Le 22/05/2013 18:09, It-forum a écrit : Hello to all, I'm trying to setup solr 4.2 to index and search into french content. I defined a special fieldtype for french content : fieldType name=text_fr class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=French protected=protwords.txt/ /analyzer analyzer type=query charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=French protected=protwords.txt/ /analyzer /fieldType unfortunately, this field does not behave as I wish. I'd like to be able to get results from unwell spelled word. IE : I wish to get the same result typing Pompe à chaleur than typing pomppe a chaler or with solère and solaire I'm do not find the right way to create a fieldtype to reach this aim. thanks in advance for your help, do not hesitate for more information if need. Regards David
Re: Solr french search optimisation
Hello, I think you're confusing three different things: 1) schema and fields definition is for precision/recall: treating differently a field means different search results and results ranking 2) the pomppe a chaler problem is more a spellchecking problem http://wiki.apache.org/solr/SpellCheckComponent 3) solère and solaire is a phonetic search problem http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PhoneticFilterFactory Hope this helps a little, cristian 2013/5/23 It-forum it-fo...@meseo.fr Hello again, Is any one could help me, plase David Le 22/05/2013 18:09, It-forum a écrit : Hello to all, I'm trying to setup solr 4.2 to index and search into french content. I defined a special fieldtype for french content : fieldType name=text_fr class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.**MappingCharFilterFactory mapping=mapping-**ISOLatin1Accent.txt/ tokenizer class=solr.** WhitespaceTokenizerFactory/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**SnowballPorterFilterFactory language=French protected=protwords.txt/ /analyzer analyzer type=query charFilter class=solr.**MappingCharFilterFactory mapping=mapping-**ISOLatin1Accent.txt/ tokenizer class=solr.** WhitespaceTokenizerFactory/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**SnowballPorterFilterFactory language=French protected=protwords.txt/ /analyzer /fieldType unfortunately, this field does not behave as I wish. I'd like to be able to get results from unwell spelled word. IE : I wish to get the same result typing Pompe à chaleur than typing pomppe a chaler or with solère and solaire I'm do not find the right way to create a fieldtype to reach this aim. thanks in advance for your help, do not hesitate for more information if need. Regards David
Re: Solr french search optimisation
You can also think about using a SynonymFilter if you can list the misspelled words. That's a quick and dirty solution. But it's easier to add a pomppe - pompe in a synonym list than tuning a phonetic filter. NB: an indexation is required whenever the synonyms file change Franck Brisbart Le jeudi 23 mai 2013 à 08:59 +0200, Cristian Cascetta a écrit : Hello, I think you're confusing three different things: 1) schema and fields definition is for precision/recall: treating differently a field means different search results and results ranking 2) the pomppe a chaler problem is more a spellchecking problem http://wiki.apache.org/solr/SpellCheckComponent 3) solère and solaire is a phonetic search problem http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PhoneticFilterFactory Hope this helps a little, cristian 2013/5/23 It-forum it-fo...@meseo.fr Hello again, Is any one could help me, plase David Le 22/05/2013 18:09, It-forum a écrit : Hello to all, I'm trying to setup solr 4.2 to index and search into french content. I defined a special fieldtype for french content : fieldType name=text_fr class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.**MappingCharFilterFactory mapping=mapping-**ISOLatin1Accent.txt/ tokenizer class=solr.** WhitespaceTokenizerFactory/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**SnowballPorterFilterFactory language=French protected=protwords.txt/ /analyzer analyzer type=query charFilter class=solr.**MappingCharFilterFactory mapping=mapping-**ISOLatin1Accent.txt/ tokenizer class=solr.** WhitespaceTokenizerFactory/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**SnowballPorterFilterFactory language=French protected=protwords.txt/ /analyzer /fieldType unfortunately, this field does not behave as I wish. I'd like to be able to get results from unwell spelled word. IE : I wish to get the same result typing Pompe à chaleur than typing pomppe a chaler or with solère and solaire I'm do not find the right way to create a fieldtype to reach this aim. thanks in advance for your help, do not hesitate for more information if need. Regards David
Re: Solr french search optimisation
Hello, Tx Cristian for your details. I totally agreed with your explanation, this is 2 differents aspect which I need to solve. Could you clarify few more thinks : - SpellchekComponent and Phonetic, should be use while indexing or only while querying ? - Does spellcheck component return only the right spelling, or is it used to search into result? - If i want to solve Spelling, Phonetic, stemming problem in french language. Can I use only one field or should I use several with different filters ? Regards David Le 23/05/2013 08:59, Cristian Cascetta a écrit : Hello, I think you're confusing three different things: 1) schema and fields definition is for precision/recall: treating differently a field means different search results and results ranking 2) the pomppe a chaler problem is more a spellchecking problem http://wiki.apache.org/solr/SpellCheckComponent 3) solère and solaire is a phonetic search problem http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PhoneticFilterFactory Hope this helps a little, cristian 2013/5/23 It-forum it-fo...@meseo.fr Hello again, Is any one could help me, plase David Le 22/05/2013 18:09, It-forum a écrit : Hello to all, I'm trying to setup solr 4.2 to index and search into french content. I defined a special fieldtype for french content : fieldType name=text_fr class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.**MappingCharFilterFactory mapping=mapping-**ISOLatin1Accent.txt/ tokenizer class=solr.** WhitespaceTokenizerFactory/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**SnowballPorterFilterFactory language=French protected=protwords.txt/ /analyzer analyzer type=query charFilter class=solr.**MappingCharFilterFactory mapping=mapping-**ISOLatin1Accent.txt/ tokenizer class=solr.** WhitespaceTokenizerFactory/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**SnowballPorterFilterFactory language=French protected=protwords.txt/ /analyzer /fieldType unfortunately, this field does not behave as I wish. I'd like to be able to get results from unwell spelled word. IE : I wish to get the same result typing Pompe à chaleur than typing pomppe a chaler or with solère and solaire I'm do not find the right way to create a fieldtype to reach this aim. thanks in advance for your help, do not hesitate for more information if need. Regards David
Re: Solr french search optimisation
Could you clarify few more thinks : - SpellchekComponent and Phonetic, should be use while indexing or only while querying ? SpellCheck: you can define a specific field for spellchecking (in this sense it's a query/schema time) or you can create a specific vocabulary for spell-checking. I strongly suggest to go through documentation http://wiki.apache.org/solr/SpellCheckComponent for this component, every time I used it I've had the need to customize and adapt configuration. - Does spellcheck component return only the right spelling, or is it used to search into result? I'm not sure, please check the documentation, but I remeber that you can configure it to directly re-execute the spell-corrected query AND show some alternatives/suggestions to the user (obviously this is a display/frontend choice) - If i want to solve Spelling, Phonetic, stemming problem in french language. Can I use only one field or should I use several with different filters ? I don't think it's possible to use only one field, in my experience I can suggest you to use multiple fields for multiple scopes, if you're scared by the index-size remember that fields that are indexed and NOT stored don't grow your index so much. Set as stored only fields you need to display to end-user.
Solr french search optimisation
Hello to all, I'm trying to setup solr 4.2 to index and search into french content. I defined a special fieldtype for french content : fieldType name=text_fr class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=French protected=protwords.txt/ /analyzer analyzer type=query charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=French protected=protwords.txt/ /analyzer /fieldType unfortunately, this field does not behave as I wish. I'd like to be able to get results from unwell spelled word. IE : I wish to get the same result typing Pompe à chaleur than typing pomppe a chaler or with solère and solaire I'm do not find the right way to create a fieldtype to reach this aim. thanks in advance for your help, do not hesitate for more information if need. Regards David