Re: Solr french search optimisation

2013-05-23 Thread It-forum

Hello again,

Is any one could help me, plase

David

Le 22/05/2013 18:09, It-forum a écrit :

Hello to all,

I'm trying to setup solr 4.2 to index and search into french content.

I defined a special fieldtype for french content :

fieldType name=text_fr class=solr.TextField 
positionIncrementGap=100

analyzer type=index
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/

tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
language=French protected=protwords.txt/

/analyzer

analyzer type=query
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/

tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
language=French protected=protwords.txt/

/analyzer
/fieldType


unfortunately, this field does not behave as I wish.

I'd like to be able to get results from unwell spelled word.

IE : I wish to get the same result typing Pompe à chaleur than 
typing pomppe a chaler  or with solère and solaire


I'm do not find the right way to create a fieldtype to reach this aim.

thanks in advance for your help, do not hesitate for more information 
if need.


Regards

David






Re: Solr french search optimisation

2013-05-23 Thread Cristian Cascetta
Hello,

I think you're confusing three different things:

1) schema and fields definition is for precision/recall: treating
differently a field means different search results and results ranking
2) the pomppe a chaler problem is more a spellchecking problem
http://wiki.apache.org/solr/SpellCheckComponent
3) solère and solaire is a phonetic search problem
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PhoneticFilterFactory

Hope this helps a little,

cristian


2013/5/23 It-forum it-fo...@meseo.fr

 Hello again,

 Is any one could help me, plase

 David

 Le 22/05/2013 18:09, It-forum a écrit :

  Hello to all,

 I'm trying to setup solr 4.2 to index and search into french content.

 I defined a special fieldtype for french content :

 fieldType name=text_fr class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
 charFilter class=solr.**MappingCharFilterFactory
 mapping=mapping-**ISOLatin1Accent.txt/
 tokenizer class=solr.**
 WhitespaceTokenizerFactory/
 filter class=solr.**WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.**LowerCaseFilterFactory/
 filter class=solr.**SnowballPorterFilterFactory
 language=French protected=protwords.txt/
 /analyzer

 analyzer type=query
 charFilter class=solr.**MappingCharFilterFactory
 mapping=mapping-**ISOLatin1Accent.txt/
 tokenizer class=solr.**
 WhitespaceTokenizerFactory/
 filter class=solr.**WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.**LowerCaseFilterFactory/
 filter class=solr.**SnowballPorterFilterFactory
 language=French protected=protwords.txt/
 /analyzer
 /fieldType


 unfortunately, this field does not behave as I wish.

 I'd like to be able to get results from unwell spelled word.

 IE : I wish to get the same result typing Pompe à chaleur than typing
 pomppe a chaler  or with solère and solaire

 I'm do not find the right way to create a fieldtype to reach this aim.

 thanks in advance for your help, do not hesitate for more information if
 need.

 Regards

 David






Re: Solr french search optimisation

2013-05-23 Thread fbrisbart
You can also think about using a SynonymFilter if you can list the
misspelled words.

That's a quick and dirty solution.
But it's easier to add a pomppe - pompe in a synonym list than tuning
a phonetic filter.
NB: an indexation is required whenever the synonyms file change

Franck Brisbart

Le jeudi 23 mai 2013 à 08:59 +0200, Cristian Cascetta a écrit :
 Hello,
 
 I think you're confusing three different things:
 
 1) schema and fields definition is for precision/recall: treating
 differently a field means different search results and results ranking
 2) the pomppe a chaler problem is more a spellchecking problem
 http://wiki.apache.org/solr/SpellCheckComponent
 3) solère and solaire is a phonetic search problem
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PhoneticFilterFactory
 
 Hope this helps a little,
 
 cristian
 
 
 2013/5/23 It-forum it-fo...@meseo.fr
 
  Hello again,
 
  Is any one could help me, plase
 
  David
 
  Le 22/05/2013 18:09, It-forum a écrit :
 
   Hello to all,
 
  I'm trying to setup solr 4.2 to index and search into french content.
 
  I defined a special fieldtype for french content :
 
  fieldType name=text_fr class=solr.TextField
  positionIncrementGap=100
  analyzer type=index
  charFilter class=solr.**MappingCharFilterFactory
  mapping=mapping-**ISOLatin1Accent.txt/
  tokenizer class=solr.**
  WhitespaceTokenizerFactory/
  filter class=solr.**WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
  filter class=solr.**LowerCaseFilterFactory/
  filter class=solr.**SnowballPorterFilterFactory
  language=French protected=protwords.txt/
  /analyzer
 
  analyzer type=query
  charFilter class=solr.**MappingCharFilterFactory
  mapping=mapping-**ISOLatin1Accent.txt/
  tokenizer class=solr.**
  WhitespaceTokenizerFactory/
  filter class=solr.**WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=0
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
  filter class=solr.**LowerCaseFilterFactory/
  filter class=solr.**SnowballPorterFilterFactory
  language=French protected=protwords.txt/
  /analyzer
  /fieldType
 
 
  unfortunately, this field does not behave as I wish.
 
  I'd like to be able to get results from unwell spelled word.
 
  IE : I wish to get the same result typing Pompe à chaleur than typing
  pomppe a chaler  or with solère and solaire
 
  I'm do not find the right way to create a fieldtype to reach this aim.
 
  thanks in advance for your help, do not hesitate for more information if
  need.
 
  Regards
 
  David
 
 
 
 




Re: Solr french search optimisation

2013-05-23 Thread It-forum

Hello,

Tx Cristian for your details.

I totally agreed with your explanation, this is 2 differents aspect 
which I need to solve.


Could you clarify few more thinks :

- SpellchekComponent and Phonetic, should be use while indexing or only 
while querying ?


- Does spellcheck component return only the right spelling, or is it 
used to search into result?


- If i want to solve Spelling, Phonetic, stemming problem in french 
language. Can I use only one field or should I use several with 
different filters ?


Regards

David


Le 23/05/2013 08:59, Cristian Cascetta a écrit :

Hello,

I think you're confusing three different things:

1) schema and fields definition is for precision/recall: treating
differently a field means different search results and results ranking
2) the pomppe a chaler problem is more a spellchecking problem
http://wiki.apache.org/solr/SpellCheckComponent
3) solère and solaire is a phonetic search problem
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PhoneticFilterFactory

Hope this helps a little,

cristian


2013/5/23 It-forum it-fo...@meseo.fr


Hello again,

Is any one could help me, plase

David

Le 22/05/2013 18:09, It-forum a écrit :

  Hello to all,

I'm trying to setup solr 4.2 to index and search into french content.

I defined a special fieldtype for french content :

 fieldType name=text_fr class=solr.TextField
positionIncrementGap=100
 analyzer type=index
 charFilter class=solr.**MappingCharFilterFactory
mapping=mapping-**ISOLatin1Accent.txt/
 tokenizer class=solr.**
WhitespaceTokenizerFactory/
 filter class=solr.**WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.**LowerCaseFilterFactory/
 filter class=solr.**SnowballPorterFilterFactory
language=French protected=protwords.txt/
 /analyzer

 analyzer type=query
 charFilter class=solr.**MappingCharFilterFactory
mapping=mapping-**ISOLatin1Accent.txt/
 tokenizer class=solr.**
WhitespaceTokenizerFactory/
 filter class=solr.**WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.**LowerCaseFilterFactory/
 filter class=solr.**SnowballPorterFilterFactory
language=French protected=protwords.txt/
 /analyzer
 /fieldType


unfortunately, this field does not behave as I wish.

I'd like to be able to get results from unwell spelled word.

IE : I wish to get the same result typing Pompe à chaleur than typing
pomppe a chaler  or with solère and solaire

I'm do not find the right way to create a fieldtype to reach this aim.

thanks in advance for your help, do not hesitate for more information if
need.

Regards

David







Re: Solr french search optimisation

2013-05-23 Thread Cristian Cascetta
 Could you clarify few more thinks :

 - SpellchekComponent and Phonetic, should be use while indexing or only
 while querying ?


SpellCheck: you can define a specific field for spellchecking (in this
sense it's a query/schema time) or you can create a specific vocabulary for
spell-checking. I strongly suggest to go through documentation
http://wiki.apache.org/solr/SpellCheckComponent for this component, every
time I used it I've had the need to customize and adapt configuration.



 - Does spellcheck component return only the right spelling, or is it used
 to search into result?


I'm not sure, please check the documentation, but I remeber that you can
configure it to directly re-execute the spell-corrected query AND show some
alternatives/suggestions to the user (obviously this is a display/frontend
choice)



 - If i want to solve Spelling, Phonetic, stemming problem in french
 language. Can I use only one field or should I use several with different
 filters ?



I don't think it's possible to use only one field, in my experience I can
suggest you to use multiple fields for multiple scopes, if you're scared by
the index-size remember that fields that are indexed and NOT stored don't
grow your index so much. Set as stored only fields you need to display to
end-user.


Solr french search optimisation

2013-05-22 Thread It-forum

Hello to all,

I'm trying to setup solr 4.2 to index and search into french content.

I defined a special fieldtype for french content :

fieldType name=text_fr class=solr.TextField 
positionIncrementGap=100

analyzer type=index
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/

tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
language=French protected=protwords.txt/

/analyzer

analyzer type=query
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/

tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
language=French protected=protwords.txt/

/analyzer
/fieldType


unfortunately, this field does not behave as I wish.

I'd like to be able to get results from unwell spelled word.

IE : I wish to get the same result typing Pompe à chaleur than typing 
pomppe a chaler  or with solère and solaire


I'm do not find the right way to create a fieldtype to reach this aim.

thanks in advance for your help, do not hesitate for more information if 
need.


Regards

David