Re: Synonyms problem
On 3/29/2013 12:14 PM, Plamen Mihaylov wrote: Can I ask you another question: I have Magento + Solr and have a requirement to create an admin magento module, where I can add/remove synonyms dynamically. Is this possible? I searched google but it seems not possible. If you change the synonym list that you are using in your index analyzer chain, you must rebuild your entire index. If you don't, the updated synonyms will only affect newly added records. This is because the index analyzer is only applied at index time. Thanks, Shawn
Synonyms problem
Hey guys, I have the following problem - I have a website with sport players, where using Solr indexing their data. I have defined synonyms like: NY, New York. When I search for New York - there are 145 results found, but when I search for NY - there are 142 results found. Why there is a diff and how can I fix this? Configuration snippets: synonyms.txt ... NY, New York ... -- schema.xml ... fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index filter class=solr. SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ tokenizer class=solr.WhitespaceTokenizerFactory / !-- we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=true / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.LengthFilterFactory min=2 max=100 / !-- filter class=solr.SnowballPorterFilterFactory language=English / -- /analyzer analyzer type=query filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / !-- filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.StopFilterFactory ignoreCase=true words=letterstops.txt enablePositionIncrements=true / /analyzer /fieldType Thanks in advance. Plamen
Re: Synonyms problem
Hi Plamen You should set expand to true during analyzer type=index filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=true/ ... Greetings, Thomas Am 29.03.2013 17:16, schrieb Plamen Mihaylov: Hey guys, I have the following problem - I have a website with sport players, where using Solr indexing their data. I have defined synonyms like: NY, New York. When I search for New York - there are 145 results found, but when I search for NY - there are 142 results found. Why there is a diff and how can I fix this? Configuration snippets: synonyms.txt ... NY, New York ... -- schema.xml ... fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index filter class=solr. SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ tokenizer class=solr.WhitespaceTokenizerFactory / !-- we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=true / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.LengthFilterFactory min=2 max=100 / !-- filter class=solr.SnowballPorterFilterFactory language=English / -- /analyzer analyzer type=query filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / !-- filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.StopFilterFactory ignoreCase=true words=letterstops.txt enablePositionIncrements=true / /analyzer /fieldType Thanks in advance. Plamen -- ontopica GmbH Prinz-Albert-Str. 2b 53113 Bonn Germany fon: +49-228-227229-22 fax: +49-228-227229-77 web: http://www.ontopica.de ontopica GmbH Sitz der Gesellschaft: Bonn Geschäftsführung: Thomas Krämer, Christoph Okpue Handelsregister: Amtsgericht Bonn, HRB 17852
Re: Synonyms problem
Also, all the filters need to be after the tokenizer. There are two synonym filters specified, one before the tokenizer and one after. I'm surprised that works at all. Shouldn't that be fatal error when loading the config? wunder On Mar 29, 2013, at 9:33 AM, Thomas Krämer | ontopica wrote: Hi Plamen You should set expand to true during analyzer type=index filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=true/ ... Greetings, Thomas Am 29.03.2013 17:16, schrieb Plamen Mihaylov: Hey guys, I have the following problem - I have a website with sport players, where using Solr indexing their data. I have defined synonyms like: NY, New York. When I search for New York - there are 145 results found, but when I search for NY - there are 142 results found. Why there is a diff and how can I fix this? Configuration snippets: synonyms.txt ... NY, New York ... -- schema.xml ... fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index filter class=solr. SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ tokenizer class=solr.WhitespaceTokenizerFactory / !-- we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=true / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.LengthFilterFactory min=2 max=100 / !-- filter class=solr.SnowballPorterFilterFactory language=English / -- /analyzer analyzer type=query filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / !-- filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.StopFilterFactory ignoreCase=true words=letterstops.txt enablePositionIncrements=true / /analyzer /fieldType Thanks in advance. Plamen -- ontopica GmbH Prinz-Albert-Str. 2b 53113 Bonn Germany fon: +49-228-227229-22 fax: +49-228-227229-77 web: http://www.ontopica.de ontopica GmbH Sitz der Gesellschaft: Bonn Geschäftsführung: Thomas Krämer, Christoph Okpue Handelsregister: Amtsgericht Bonn, HRB 17852 -- Walter Underwood wun...@wunderwood.org
Re: Synonyms problem
The XPath expressions used to collect the charFilter sequence, the tokenizer, and the token filter sequence are evaluated independently of each other - see line #244 through #251: http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_4_2_0/solr/core/src/java/org/apache/solr/schema/FieldTypePluginLoader.java?view=markup#l232 Steve On Mar 29, 2013, at 12:37 PM, Walter Underwood wun...@wunderwood.org wrote: Also, all the filters need to be after the tokenizer. There are two synonym filters specified, one before the tokenizer and one after. I'm surprised that works at all. Shouldn't that be fatal error when loading the config? wunder On Mar 29, 2013, at 9:33 AM, Thomas Krämer | ontopica wrote: Hi Plamen You should set expand to true during analyzer type=index filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=true/ ... Greetings, Thomas Am 29.03.2013 17:16, schrieb Plamen Mihaylov: Hey guys, I have the following problem - I have a website with sport players, where using Solr indexing their data. I have defined synonyms like: NY, New York. When I search for New York - there are 145 results found, but when I search for NY - there are 142 results found. Why there is a diff and how can I fix this? Configuration snippets: synonyms.txt ... NY, New York ... -- schema.xml ... fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index filter class=solr. SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ tokenizer class=solr.WhitespaceTokenizerFactory / !-- we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=true / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.LengthFilterFactory min=2 max=100 / !-- filter class=solr.SnowballPorterFilterFactory language=English / -- /analyzer analyzer type=query filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / !-- filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.StopFilterFactory ignoreCase=true words=letterstops.txt enablePositionIncrements=true / /analyzer /fieldType Thanks in advance. Plamen -- ontopica GmbH Prinz-Albert-Str. 2b 53113 Bonn Germany fon: +49-228-227229-22 fax: +49-228-227229-77 web: http://www.ontopica.de ontopica GmbH Sitz der Gesellschaft: Bonn Geschäftsführung: Thomas Krämer, Christoph Okpue Handelsregister: Amtsgericht Bonn, HRB 17852 -- Walter Underwood wun...@wunderwood.org
Re: Synonyms problem
Guys, This is a commented line where expand is false. I moved the synonym filter after tokenizer, but the result is the same. Actual configuration: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=true / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.LengthFilterFactory min=2 max=100 / !-- filter class=solr.SnowballPorterFilterFactory language=English / -- /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / !-- filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.StopFilterFactory ignoreCase=true words=letterstops.txt enablePositionIncrements=true / /analyzer /fieldType 2013/3/29 Walter Underwood wun...@wunderwood.org Also, all the filters need to be after the tokenizer. There are two synonym filters specified, one before the tokenizer and one after. I'm surprised that works at all. Shouldn't that be fatal error when loading the config? wunder On Mar 29, 2013, at 9:33 AM, Thomas Krämer | ontopica wrote: Hi Plamen You should set expand to true during analyzer type=index filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=true/ ... Greetings, Thomas Am 29.03.2013 17:16, schrieb Plamen Mihaylov: Hey guys, I have the following problem - I have a website with sport players, where using Solr indexing their data. I have defined synonyms like: NY, New York. When I search for New York - there are 145 results found, but when I search for NY - there are 142 results found. Why there is a diff and how can I fix this? Configuration snippets: synonyms.txt ... NY, New York ... -- schema.xml ... fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index filter class=solr. SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ tokenizer class=solr.WhitespaceTokenizerFactory / !-- we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=true / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.LengthFilterFactory min=2 max=100 / !-- filter class=solr.SnowballPorterFilterFactory language=English / -- /analyzer analyzer type=query filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / !-- filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- filter
Re: Synonyms problem
There are several problems with this config. Indexing uses the phonetic filter, but query does not. This almost guarantees that nothing will match. Numbers could match, if the filter passes them. Query time has two stopword filters with different lists. Indexing only has one. This isn't fatal, but it is pretty weird. Is letterstops.txt trying to do the same thing as the length filter? If so, use the length filter both place. Or not at all. Deleting single all single characters is a bad idea. You'll never find Vitamin C. The same synonyms are used at index and query time, which is unnecessary. Only use synonyms at index time unless you really know what you are doing and have a special need. wunder On Mar 29, 2013, at 9:53 AM, Plamen Mihaylov wrote: Guys, This is a commented line where expand is false. I moved the synonym filter after tokenizer, but the result is the same. Actual configuration: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=true / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.LengthFilterFactory min=2 max=100 / !-- filter class=solr.SnowballPorterFilterFactory language=English / -- /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / !-- filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.StopFilterFactory ignoreCase=true words=letterstops.txt enablePositionIncrements=true / /analyzer /fieldType 2013/3/29 Walter Underwood wun...@wunderwood.org Also, all the filters need to be after the tokenizer. There are two synonym filters specified, one before the tokenizer and one after. I'm surprised that works at all. Shouldn't that be fatal error when loading the config? wunder On Mar 29, 2013, at 9:33 AM, Thomas Krämer | ontopica wrote: Hi Plamen You should set expand to true during analyzer type=index filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=true/ ... Greetings, Thomas Am 29.03.2013 17:16, schrieb Plamen Mihaylov: Hey guys, I have the following problem - I have a website with sport players, where using Solr indexing their data. I have defined synonyms like: NY, New York. When I search for New York - there are 145 results found, but when I search for NY - there are 142 results found. Why there is a diff and how can I fix this? Configuration snippets: synonyms.txt ... NY, New York ... -- schema.xml ... fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index filter class=solr. SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ tokenizer class=solr.WhitespaceTokenizerFactory / !-- we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=true / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.LengthFilterFactory min=2 max=100 / !-- filter
Re: Synonyms problem
Thank you a lot, Walter. I removed most of the filters and now it returns the same number of results. It looks simply this way: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType Can I ask you another question: I have Magento + Solr and have a requirement to create an admin magento module, where I can add/remove synonyms dynamically. Is this possible? I searched google but it seems not possible. Regards Plamen 2013/3/29 Walter Underwood wun...@wunderwood.org There are several problems with this config. Indexing uses the phonetic filter, but query does not. This almost guarantees that nothing will match. Numbers could match, if the filter passes them. Query time has two stopword filters with different lists. Indexing only has one. This isn't fatal, but it is pretty weird. Is letterstops.txt trying to do the same thing as the length filter? If so, use the length filter both place. Or not at all. Deleting single all single characters is a bad idea. You'll never find Vitamin C. The same synonyms are used at index and query time, which is unnecessary. Only use synonyms at index time unless you really know what you are doing and have a special need. wunder On Mar 29, 2013, at 9:53 AM, Plamen Mihaylov wrote: Guys, This is a commented line where expand is false. I moved the synonym filter after tokenizer, but the result is the same. Actual configuration: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.PhoneticFilterFactory encoder=DoubleMetaphone inject=true / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.LengthFilterFactory min=2 max=100 / !-- filter class=solr.SnowballPorterFilterFactory language=English / -- /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory / !-- filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.StopFilterFactory ignoreCase=true words=letterstops.txt enablePositionIncrements=true / /analyzer /fieldType 2013/3/29 Walter Underwood wun...@wunderwood.org Also, all the filters need to be after the tokenizer. There are two synonym filters specified, one before the tokenizer and one after. I'm surprised that works at all. Shouldn't that be fatal error when loading the config? wunder On Mar 29, 2013, at 9:33 AM, Thomas Krämer | ontopica wrote: Hi Plamen You should set expand to true during analyzer type=index filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=true/ ... Greetings, Thomas Am 29.03.2013 17:16, schrieb Plamen Mihaylov: Hey guys, I have the following problem - I have a website with sport players, where using Solr indexing their data. I have defined synonyms like: NY, New York. When I search for New York - there are 145 results found, but when I search for NY - there are 142 results found. Why there is a diff and how can I fix this? Configuration snippets: synonyms.txt ...
Synonyms problem
hello, I have some problems with synonyms. I will show some examples to descripe the problem: Data: High school Lissabon High school Barcelona University of applied science When a user search for IFD i want all the results back. So i want to use this synonyms at query time: IFD = high school lissabon, high school barcelona,University of applied science The data is stored in the field schools. Schools type looks like this: fieldType name=schools class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.PatternTokenizerFactory pattern=\s|,|- / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.PatternTokenizerFactory pattern=\s|,|- / filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ /analyzer /fieldType AS you can see i use some pattern tokenizer which splits on whitespace. When i use the synonyms at query time the analytics show me this: high | school | lissabon| science high | school | barcelona | university | of | applied | When i search for IFD i get no results. I found this in debugQuery: schools:(high high university) (school school of) (lissaban barcelona applied) (science) With this i see the problem: solr tries a lot of combinations but not the right one. I thought i could escape the whitespaces in the synonyms(High\ school\ Lissabon). Then the analytics shows me better results: High school Lissabon High school Barcelona University of applied science Then SOLR search for high school Lissabon but in my index it is tokenized on whitespace, still no results. I'm stuck, can someone help me?? Thanks R -- View this message in context: http://lucene.472066.n3.nabble.com/Synonyms-problem-tp3316287p3316287.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Synonyms problem
Simply multi-word synonyms are recommended to use at index time. As explained here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory --- On Wed, 9/7/11, roySolr royrutten1...@gmail.com wrote: From: roySolr royrutten1...@gmail.com Subject: Synonyms problem To: solr-user@lucene.apache.org Date: Wednesday, September 7, 2011, 1:46 PM hello, I have some problems with synonyms. I will show some examples to descripe the problem: Data: High school Lissabon High school Barcelona University of applied science When a user search for IFD i want all the results back. So i want to use this synonyms at query time: IFD = high school lissabon, high school barcelona,University of applied science The data is stored in the field schools. Schools type looks like this: fieldType name=schools class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.PatternTokenizerFactory pattern=\s|,|- / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.PatternTokenizerFactory pattern=\s|,|- / filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ /analyzer /fieldType AS you can see i use some pattern tokenizer which splits on whitespace. When i use the synonyms at query time the analytics show me this: high | school | lissabon | science high | school | barcelona | university | of | applied | When i search for IFD i get no results. I found this in debugQuery: schools:(high high university) (school school of) (lissaban barcelona applied) (science) With this i see the problem: solr tries a lot of combinations but not the right one. I thought i could escape the whitespaces in the synonyms(High\ school\ Lissabon). Then the analytics shows me better results: High school Lissabon High school Barcelona University of applied science Then SOLR search for high school Lissabon but in my index it is tokenized on whitespace, still no results. I'm stuck, can someone help me?? Thanks R -- View this message in context: http://lucene.472066.n3.nabble.com/Synonyms-problem-tp3316287p3316287.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: synonyms problem
What does call synonym methods in Java mean? That is, what are you trying to accomplish and from where? Best Erick On Sun, Jun 5, 2011 at 9:48 PM, deniz denizdurmu...@gmail.com wrote: well i have changed it into text... but still confused about how to use synonyms... and also I want to know how to call synonym methods in java... i have tried to use synonymmap and some other similar things but nothing happens... anyone can give me a sample or a website that i can find examples about solr in java? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/synonyms-problem-tp3014006p3028353.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: synonyms problem
well i was trying to say that; i have changed the config files for synonyms and so on but nothing happens so i thought i needed to do something in java code too... i was trying to ask about that... - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/synonyms-problem-tp3014006p3032666.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: synonyms problem
well i have changed it into text... but still confused about how to use synonyms... and also I want to know how to call synonym methods in java... i have tried to use synonymmap and some other similar things but nothing happens... anyone can give me a sample or a website that i can find examples about solr in java? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/synonyms-problem-tp3014006p3028353.html Sent from the Solr - User mailing list archive at Nabble.com.
synonyms problem
Hi all, here is a piece from my solfconfig: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ /analyzer /fieldType but somehow synonyms are not read... I mean there is no match when i use a word in the synonym file... any ideas? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/synonyms-problem-tp3014006p3014006.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: synonyms problem
On Thu, Jun 2, 2011 at 11:58 AM, deniz denizdurmu...@gmail.com wrote: Hi all, here is a piece from my solfconfig: [...] but somehow synonyms are not read... I mean there is no match when i use a word in the synonym file... any ideas? [...] Please provide further details, e.g., is your field in schema.xml using this fieldType, one example line from the synonyms.txt file, how are you searching, what results you expect to get, and what are the actual results. Also, while this is not the issue here, normally the fieldType string is a non-analyzed field, and one would normally use a different fieldType, e.g., text for data that are to be analyzed. Regards, Gora
Re: synonyms problem
Deniz, it looks like you are missing an index anlayzer ? or have you removed that for brevity ? lee c On 2 June 2011 10:41, Gora Mohanty g...@mimirtech.com wrote: On Thu, Jun 2, 2011 at 11:58 AM, deniz denizdurmu...@gmail.com wrote: Hi all, here is a piece from my solfconfig: [...] but somehow synonyms are not read... I mean there is no match when i use a word in the synonym file... any ideas? [...] Please provide further details, e.g., is your field in schema.xml using this fieldType, one example line from the synonyms.txt file, how are you searching, what results you expect to get, and what are the actual results. Also, while this is not the issue here, normally the fieldType string is a non-analyzed field, and one would normally use a different fieldType, e.g., text for data that are to be analyzed. Regards, Gora
Re: synonyms problem
oh and its a string field change this to be text if you need analysis class=solr.StrField lee c On 2 June 2011 11:45, lee carroll lee.a.carr...@googlemail.com wrote: Deniz, it looks like you are missing an index anlayzer ? or have you removed that for brevity ? lee c On 2 June 2011 10:41, Gora Mohanty g...@mimirtech.com wrote: On Thu, Jun 2, 2011 at 11:58 AM, deniz denizdurmu...@gmail.com wrote: Hi all, here is a piece from my solfconfig: [...] but somehow synonyms are not read... I mean there is no match when i use a word in the synonym file... any ideas? [...] Please provide further details, e.g., is your field in schema.xml using this fieldType, one example line from the synonyms.txt file, how are you searching, what results you expect to get, and what are the actual results. Also, while this is not the issue here, normally the fieldType string is a non-analyzed field, and one would normally use a different fieldType, e.g., text for data that are to be analyzed. Regards, Gora
Re: synonyms problem
Are you sure solr.StrField is the way to go with this? solr.StrField stores the entire text verbatim and I am pretty sure skips any analysis. Perhaps you should use solr.TextField instead. François On Jun 2, 2011, at 2:28 AM, deniz wrote: Hi all, here is a piece from my solfconfig: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ /analyzer /fieldType but somehow synonyms are not read... I mean there is no match when i use a word in the synonym file... any ideas? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/synonyms-problem-tp3014006p3014006.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: synonyms problem
oh thank you for reminding me about string and text issues... I will change it asap... and about index analyzer i just removed if for brevity... i will try again and if it fails will post here again... thank you so much - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/synonyms-problem-tp3014006p3018185.html Sent from the Solr - User mailing list archive at Nabble.com.
synonyms problem
Hi all! I have a little problem with synonyms: when I set my synonyms.txt file such as: aberrant=abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical it's all right! But if I set this file such as aberrant,abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical I get exception that not enough memory -- View this message in context: http://old.nabble.com/synonyms-problem-tp27987378p27987378.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: synonyms problem
Have you tried increasing memory size ? we had some out of memory problems when we used default memory size .. Kind regards Armando michaelnazaruk wrote: Hi all! I have a little problem with synonyms: when I set my synonyms.txt file such as: aberrant=abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical it's all right! But if I set this file such as aberrant,abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical I get exception that not enough memory
Re: synonyms problem
How large is the document, and how often does 'aberrant' appear in it? Are the other words also in the document? What is the full analysis stack? There might be interactions between the SynonymFilter and other filters. What does the admin/analysis.jsp page show? Does it throw OutOfMemory also? Does stemming turn two of the terms into the same term? On Mon, Mar 22, 2010 at 7:48 AM, Armando Ota armando...@siol.net wrote: Have you tried increasing memory size ? we had some out of memory problems when we used default memory size .. Kind regards Armando michaelnazaruk wrote: Hi all! I have a little problem with synonyms: when I set my synonyms.txt file such as: aberrant=abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical it's all right! But if I set this file such as aberrant,abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical I get exception that not enough memory -- Lance Norskog goks...@gmail.com