Re: Synonyms problem

2013-04-03 Thread Shawn Heisey
On 3/29/2013 12:14 PM, Plamen Mihaylov wrote:
 Can I ask you another question: I have Magento + Solr and have a
 requirement to create an admin magento module, where I can add/remove
 synonyms dynamically. Is this possible? I searched google but it seems not
 possible.

If you change the synonym list that you are using in your index analyzer
chain, you must rebuild your entire index.  If you don't, the updated
synonyms will only affect newly added records.  This is because the
index analyzer is only applied at index time.

Thanks,
Shawn



Synonyms problem

2013-03-29 Thread Plamen Mihaylov
Hey guys,

I have the following problem - I have a website with sport players, where
using Solr indexing their data. I have defined synonyms like: NY, New York.
When I search for New York - there are 145 results found, but when I search
for NY - there are 142 results found. Why there is a diff and how can I fix
this?

Configuration snippets:

synonyms.txt

...
NY, New York
...

--
schema.xml

...
 fieldType name=text class=solr.TextField
positionIncrementGap=100
analyzer type=index
filter class=solr.
SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
tokenizer class=solr.WhitespaceTokenizerFactory /
!-- we will only use synonyms at query time filter
class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
ignoreCase=true expand=false/ --

filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0
splitOnCaseChange=1 /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PhoneticFilterFactory
encoder=DoubleMetaphone inject=true /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
filter class=solr.LengthFilterFactory min=2 max=100
/
!-- filter class=solr.SnowballPorterFilterFactory
language=English / --
/analyzer
analyzer type=query
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true /
tokenizer class=solr.WhitespaceTokenizerFactory /

filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 /
filter class=solr.LowerCaseFilterFactory /
!-- filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/ --
filter class=solr.RemoveDuplicatesTokenFilterFactory /
filter class=solr.StopFilterFactory ignoreCase=true
words=letterstops.txt enablePositionIncrements=true /
/analyzer
/fieldType


Thanks in advance.
Plamen


Re: Synonyms problem

2013-03-29 Thread Thomas Krämer | ontopica
Hi Plamen

You should set expand to true during

analyzer type=index

filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
  ignoreCase=true expand=true/


...

Greetings,

Thomas

Am 29.03.2013 17:16, schrieb Plamen Mihaylov:
 Hey guys,
 
 I have the following problem - I have a website with sport players, where
 using Solr indexing their data. I have defined synonyms like: NY, New York.
 When I search for New York - there are 145 results found, but when I search
 for NY - there are 142 results found. Why there is a diff and how can I fix
 this?
 
 Configuration snippets:
 
 synonyms.txt
 
 ...
 NY, New York
 ...
 
 --
 schema.xml
 
 ...
  fieldType name=text class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
 filter class=solr.
 SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 tokenizer class=solr.WhitespaceTokenizerFactory /
 !-- we will only use synonyms at query time filter
 class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
 ignoreCase=true expand=false/ --
 
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0
 splitOnCaseChange=1 /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.PhoneticFilterFactory
 encoder=DoubleMetaphone inject=true /
 filter class=solr.RemoveDuplicatesTokenFilterFactory /
 filter class=solr.LengthFilterFactory min=2 max=100
 /
 !-- filter class=solr.SnowballPorterFilterFactory
 language=English / --
 /analyzer
 analyzer type=query
 filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true /
 tokenizer class=solr.WhitespaceTokenizerFactory /
 
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 /
 filter class=solr.LowerCaseFilterFactory /
 !-- filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/ --
 filter class=solr.RemoveDuplicatesTokenFilterFactory /
 filter class=solr.StopFilterFactory ignoreCase=true
 words=letterstops.txt enablePositionIncrements=true /
 /analyzer
 /fieldType
 
 
 Thanks in advance.
 Plamen
 


-- 

ontopica GmbH
Prinz-Albert-Str. 2b
53113 Bonn
Germany
fon: +49-228-227229-22
fax: +49-228-227229-77
web: http://www.ontopica.de
ontopica GmbH
Sitz der Gesellschaft: Bonn

Geschäftsführung: Thomas Krämer, Christoph Okpue
Handelsregister: Amtsgericht Bonn, HRB 17852




Re: Synonyms problem

2013-03-29 Thread Walter Underwood
Also, all the filters need to be after the tokenizer. There are two synonym 
filters specified, one before the tokenizer and one after.

I'm surprised that works at all. Shouldn't that be fatal error when loading the 
config?

wunder

On Mar 29, 2013, at 9:33 AM, Thomas Krämer | ontopica wrote:

 Hi Plamen
 
 You should set expand to true during
 
 analyzer type=index
 
 filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
  ignoreCase=true expand=true/
 
 
 ...
 
 Greetings,
 
 Thomas
 
 Am 29.03.2013 17:16, schrieb Plamen Mihaylov:
 Hey guys,
 
 I have the following problem - I have a website with sport players, where
 using Solr indexing their data. I have defined synonyms like: NY, New York.
 When I search for New York - there are 145 results found, but when I search
 for NY - there are 142 results found. Why there is a diff and how can I fix
 this?
 
 Configuration snippets:
 
 synonyms.txt
 
 ...
 NY, New York
 ...
 
 --
 schema.xml
 
 ...
 fieldType name=text class=solr.TextField
 positionIncrementGap=100
analyzer type=index
filter class=solr.
 SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
tokenizer class=solr.WhitespaceTokenizerFactory /
!-- we will only use synonyms at query time filter
 class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
ignoreCase=true expand=false/ --
 
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0
 splitOnCaseChange=1 /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PhoneticFilterFactory
 encoder=DoubleMetaphone inject=true /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
filter class=solr.LengthFilterFactory min=2 max=100
 /
!-- filter class=solr.SnowballPorterFilterFactory
 language=English / --
/analyzer
analyzer type=query
filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true /
tokenizer class=solr.WhitespaceTokenizerFactory /
 
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 /
filter class=solr.LowerCaseFilterFactory /
!-- filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/ --
filter class=solr.RemoveDuplicatesTokenFilterFactory /
filter class=solr.StopFilterFactory ignoreCase=true
 words=letterstops.txt enablePositionIncrements=true /
/analyzer
/fieldType
 
 
 Thanks in advance.
 Plamen
 
 
 
 -- 
 
 ontopica GmbH
 Prinz-Albert-Str. 2b
 53113 Bonn
 Germany
 fon: +49-228-227229-22
 fax: +49-228-227229-77
 web: http://www.ontopica.de
 ontopica GmbH
 Sitz der Gesellschaft: Bonn
 
 Geschäftsführung: Thomas Krämer, Christoph Okpue
 Handelsregister: Amtsgericht Bonn, HRB 17852
 
 

--
Walter Underwood
wun...@wunderwood.org





Re: Synonyms problem

2013-03-29 Thread Steve Rowe
The XPath expressions used to collect the charFilter sequence, the tokenizer, 
and the token filter sequence are evaluated independently of each other - see 
line #244 through #251:

http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_4_2_0/solr/core/src/java/org/apache/solr/schema/FieldTypePluginLoader.java?view=markup#l232

Steve

On Mar 29, 2013, at 12:37 PM, Walter Underwood wun...@wunderwood.org wrote:

 Also, all the filters need to be after the tokenizer. There are two synonym 
 filters specified, one before the tokenizer and one after.
 
 I'm surprised that works at all. Shouldn't that be fatal error when loading 
 the config?
 
 wunder
 
 On Mar 29, 2013, at 9:33 AM, Thomas Krämer | ontopica wrote:
 
 Hi Plamen
 
 You should set expand to true during
 
 analyzer type=index
 
 filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
 ignoreCase=true expand=true/
 
 
 ...
 
 Greetings,
 
 Thomas
 
 Am 29.03.2013 17:16, schrieb Plamen Mihaylov:
 Hey guys,
 
 I have the following problem - I have a website with sport players, where
 using Solr indexing their data. I have defined synonyms like: NY, New York.
 When I search for New York - there are 145 results found, but when I search
 for NY - there are 142 results found. Why there is a diff and how can I fix
 this?
 
 Configuration snippets:
 
 synonyms.txt
 
 ...
 NY, New York
 ...
 
 --
 schema.xml
 
 ...
fieldType name=text class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
   filter class=solr.
 SynonymFilterFactory synonyms=synonyms.txt
   ignoreCase=true expand=true/
   tokenizer class=solr.WhitespaceTokenizerFactory /
   !-- we will only use synonyms at query time filter
 class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
   ignoreCase=true expand=false/ --
 
   filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
   catenateNumbers=1 catenateAll=0
 splitOnCaseChange=1 /
   filter class=solr.LowerCaseFilterFactory /
   filter class=solr.PhoneticFilterFactory
 encoder=DoubleMetaphone inject=true /
   filter class=solr.RemoveDuplicatesTokenFilterFactory /
   filter class=solr.LengthFilterFactory min=2 max=100
 /
   !-- filter class=solr.SnowballPorterFilterFactory
 language=English / --
   /analyzer
   analyzer type=query
   filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true /
   tokenizer class=solr.WhitespaceTokenizerFactory /
 
   filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
   catenateNumbers=0 catenateAll=0 /
   filter class=solr.LowerCaseFilterFactory /
   !-- filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/ --
   filter class=solr.RemoveDuplicatesTokenFilterFactory /
   filter class=solr.StopFilterFactory ignoreCase=true
 words=letterstops.txt enablePositionIncrements=true /
   /analyzer
   /fieldType
 
 
 Thanks in advance.
 Plamen
 
 
 
 -- 
 
 ontopica GmbH
 Prinz-Albert-Str. 2b
 53113 Bonn
 Germany
 fon: +49-228-227229-22
 fax: +49-228-227229-77
 web: http://www.ontopica.de
 ontopica GmbH
 Sitz der Gesellschaft: Bonn
 
 Geschäftsführung: Thomas Krämer, Christoph Okpue
 Handelsregister: Amtsgericht Bonn, HRB 17852
 
 
 
 --
 Walter Underwood
 wun...@wunderwood.org
 
 
 



Re: Synonyms problem

2013-03-29 Thread Plamen Mihaylov
Guys,

This is a commented line where expand is false. I moved the synonym filter
after tokenizer, but the result is the same.

Actual configuration:

fieldType name=text class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0
splitOnCaseChange=1 /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PhoneticFilterFactory
encoder=DoubleMetaphone inject=true /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
filter class=solr.LengthFilterFactory min=2 max=100
/
!-- filter class=solr.SnowballPorterFilterFactory
language=English / --
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true /
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 /
filter class=solr.LowerCaseFilterFactory /
!-- filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/ --
filter class=solr.RemoveDuplicatesTokenFilterFactory /
filter class=solr.StopFilterFactory ignoreCase=true
words=letterstops.txt enablePositionIncrements=true /
/analyzer
/fieldType

2013/3/29 Walter Underwood wun...@wunderwood.org

 Also, all the filters need to be after the tokenizer. There are two
 synonym filters specified, one before the tokenizer and one after.

 I'm surprised that works at all. Shouldn't that be fatal error when
 loading the config?

 wunder

 On Mar 29, 2013, at 9:33 AM, Thomas Krämer | ontopica wrote:

  Hi Plamen
 
  You should set expand to true during
 
  analyzer type=index
  
  filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
   ignoreCase=true expand=true/
 
 
  ...
 
  Greetings,
 
  Thomas
 
  Am 29.03.2013 17:16, schrieb Plamen Mihaylov:
  Hey guys,
 
  I have the following problem - I have a website with sport players,
 where
  using Solr indexing their data. I have defined synonyms like: NY, New
 York.
  When I search for New York - there are 145 results found, but when I
 search
  for NY - there are 142 results found. Why there is a diff and how can I
 fix
  this?
 
  Configuration snippets:
 
  synonyms.txt
 
  ...
  NY, New York
  ...
 
  --
  schema.xml
 
  ...
  fieldType name=text class=solr.TextField
  positionIncrementGap=100
 analyzer type=index
 filter class=solr.
  SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 tokenizer class=solr.WhitespaceTokenizerFactory /
 !-- we will only use synonyms at query time filter
  class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
 ignoreCase=true expand=false/ --
 
 filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0
  splitOnCaseChange=1 /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.PhoneticFilterFactory
  encoder=DoubleMetaphone inject=true /
 filter class=solr.RemoveDuplicatesTokenFilterFactory
 /
 filter class=solr.LengthFilterFactory min=2
 max=100
  /
 !-- filter class=solr.SnowballPorterFilterFactory
  language=English / --
 /analyzer
 analyzer type=query
 filter class=solr.SynonymFilterFactory
  synonyms=synonyms.txt ignoreCase=true expand=true /
 tokenizer class=solr.WhitespaceTokenizerFactory /
 
 filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt /
 filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 /
 filter class=solr.LowerCaseFilterFactory /
 !-- filter class=solr.EnglishPorterFilterFactory
  protected=protwords.txt/ --
 filter 

Re: Synonyms problem

2013-03-29 Thread Walter Underwood
There are several problems with this config.

Indexing uses the phonetic filter, but query does not. This almost guarantees 
that nothing will match. Numbers could match, if the filter passes them.

Query time has two stopword filters with different lists. Indexing only has 
one. This isn't fatal, but it is pretty weird. Is letterstops.txt trying to do 
the same thing as the length filter? If so, use the length filter both place. 
Or not at all. Deleting single all single characters is a bad idea. You'll 
never find Vitamin C.

The same synonyms are used at index and query time, which is unnecessary. Only 
use synonyms at index time unless you really know what you are doing and have a 
special need.

wunder

On Mar 29, 2013, at 9:53 AM, Plamen Mihaylov wrote:

 Guys,
 
 This is a commented line where expand is false. I moved the synonym filter
 after tokenizer, but the result is the same.
 
 Actual configuration:
 
fieldType name=text class=solr.TextField
 positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0
 splitOnCaseChange=1 /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PhoneticFilterFactory
 encoder=DoubleMetaphone inject=true /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
filter class=solr.LengthFilterFactory min=2 max=100
 /
!-- filter class=solr.SnowballPorterFilterFactory
 language=English / --
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true /
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 /
filter class=solr.LowerCaseFilterFactory /
!-- filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/ --
filter class=solr.RemoveDuplicatesTokenFilterFactory /
filter class=solr.StopFilterFactory ignoreCase=true
 words=letterstops.txt enablePositionIncrements=true /
/analyzer
/fieldType
 
 2013/3/29 Walter Underwood wun...@wunderwood.org
 
 Also, all the filters need to be after the tokenizer. There are two
 synonym filters specified, one before the tokenizer and one after.
 
 I'm surprised that works at all. Shouldn't that be fatal error when
 loading the config?
 
 wunder
 
 On Mar 29, 2013, at 9:33 AM, Thomas Krämer | ontopica wrote:
 
 Hi Plamen
 
 You should set expand to true during
 
 analyzer type=index
 
 filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
 ignoreCase=true expand=true/
 
 
 ...
 
 Greetings,
 
 Thomas
 
 Am 29.03.2013 17:16, schrieb Plamen Mihaylov:
 Hey guys,
 
 I have the following problem - I have a website with sport players,
 where
 using Solr indexing their data. I have defined synonyms like: NY, New
 York.
 When I search for New York - there are 145 results found, but when I
 search
 for NY - there are 142 results found. Why there is a diff and how can I
 fix
 this?
 
 Configuration snippets:
 
 synonyms.txt
 
 ...
 NY, New York
 ...
 
 --
 schema.xml
 
 ...
fieldType name=text class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
   filter class=solr.
 SynonymFilterFactory synonyms=synonyms.txt
   ignoreCase=true expand=true/
   tokenizer class=solr.WhitespaceTokenizerFactory /
   !-- we will only use synonyms at query time filter
 class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
   ignoreCase=true expand=false/ --
 
   filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
   catenateNumbers=1 catenateAll=0
 splitOnCaseChange=1 /
   filter class=solr.LowerCaseFilterFactory /
   filter class=solr.PhoneticFilterFactory
 encoder=DoubleMetaphone inject=true /
   filter class=solr.RemoveDuplicatesTokenFilterFactory
 /
   filter class=solr.LengthFilterFactory min=2
 max=100
 /
   !-- filter 

Re: Synonyms problem

2013-03-29 Thread Plamen Mihaylov
Thank you a lot, Walter. I removed most of the filters and now it returns
the same number of results. It looks simply this way:

fieldType name=text class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
/analyzer
/fieldType

Can I ask you another question: I have Magento + Solr and have a
requirement to create an admin magento module, where I can add/remove
synonyms dynamically. Is this possible? I searched google but it seems not
possible.

Regards
Plamen

2013/3/29 Walter Underwood wun...@wunderwood.org

 There are several problems with this config.

 Indexing uses the phonetic filter, but query does not. This almost
 guarantees that nothing will match. Numbers could match, if the filter
 passes them.

 Query time has two stopword filters with different lists. Indexing only
 has one. This isn't fatal, but it is pretty weird. Is letterstops.txt
 trying to do the same thing as the length filter? If so, use the length
 filter both place. Or not at all. Deleting single all single characters is
 a bad idea. You'll never find Vitamin C.

 The same synonyms are used at index and query time, which is unnecessary.
 Only use synonyms at index time unless you really know what you are doing
 and have a special need.

 wunder

 On Mar 29, 2013, at 9:53 AM, Plamen Mihaylov wrote:

  Guys,
 
  This is a commented line where expand is false. I moved the synonym
 filter
  after tokenizer, but the result is the same.
 
  Actual configuration:
 
 fieldType name=text class=solr.TextField
  positionIncrementGap=100
 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory /
 filter class=solr.SynonymFilterFactory
  synonyms=synonyms.txt ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0
  splitOnCaseChange=1 /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.PhoneticFilterFactory
  encoder=DoubleMetaphone inject=true /
 filter class=solr.RemoveDuplicatesTokenFilterFactory /
 filter class=solr.LengthFilterFactory min=2 max=100
  /
 !-- filter class=solr.SnowballPorterFilterFactory
  language=English / --
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory /
 filter class=solr.SynonymFilterFactory
  synonyms=synonyms.txt ignoreCase=true expand=true /
 filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt /
 filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 /
 filter class=solr.LowerCaseFilterFactory /
 !-- filter class=solr.EnglishPorterFilterFactory
  protected=protwords.txt/ --
 filter class=solr.RemoveDuplicatesTokenFilterFactory /
 filter class=solr.StopFilterFactory ignoreCase=true
  words=letterstops.txt enablePositionIncrements=true /
 /analyzer
 /fieldType
 
  2013/3/29 Walter Underwood wun...@wunderwood.org
 
  Also, all the filters need to be after the tokenizer. There are two
  synonym filters specified, one before the tokenizer and one after.
 
  I'm surprised that works at all. Shouldn't that be fatal error when
  loading the config?
 
  wunder
 
  On Mar 29, 2013, at 9:33 AM, Thomas Krämer | ontopica wrote:
 
  Hi Plamen
 
  You should set expand to true during
 
  analyzer type=index
  
  filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
  ignoreCase=true expand=true/
 
 
  ...
 
  Greetings,
 
  Thomas
 
  Am 29.03.2013 17:16, schrieb Plamen Mihaylov:
  Hey guys,
 
  I have the following problem - I have a website with sport players,
  where
  using Solr indexing their data. I have defined synonyms like: NY, New
  York.
  When I search for New York - there are 145 results found, but when I
  search
  for NY - there are 142 results found. Why there is a diff and how can
 I
  fix
  this?
 
  Configuration snippets:
 
  synonyms.txt
 
  ...

Synonyms problem

2011-09-07 Thread roySolr
hello,

I have some problems with synonyms. I will show some examples to descripe
the problem:

Data:

High school Lissabon
High school Barcelona
University of applied science

When a user search for IFD i want all the results back. So i want to use
this synonyms at query time:

IFD = high school lissabon, high school barcelona,University of applied
science


The data is stored in the field schools.

Schools type looks like this:

   fieldType name=schools class=solr.TextField
positionIncrementGap=100
  analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.PatternTokenizerFactory pattern=\s|,|- /
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.PatternTokenizerFactory pattern=\s|,|- /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=false/
  /analyzer
/fieldType


AS you can see i use some pattern tokenizer which splits on whitespace. When
i use the synonyms at query time the
analytics show me this:

high   | school | lissabon| science
high   | school | barcelona  | 
university   | of   | applied  |

When i search for IFD i get no results. I found this in debugQuery:

schools:(high high university) (school school of) (lissaban barcelona
applied) (science)

With this i see the problem: solr tries a lot of combinations but not the
right one. I thought i could
escape the whitespaces in the synonyms(High\ school\ Lissabon). Then the
analytics shows me better results:

High school Lissabon
High school Barcelona
University of applied science

Then SOLR search for high school Lissabon but in my index it is tokenized
on whitespace, still no results.


I'm stuck, can someone help me??

Thanks
R



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Synonyms-problem-tp3316287p3316287.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Synonyms problem

2011-09-07 Thread Ahmet Arslan
Simply multi-word synonyms are recommended to use at index time. 

As explained here: 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory


--- On Wed, 9/7/11, roySolr royrutten1...@gmail.com wrote:

 From: roySolr royrutten1...@gmail.com
 Subject: Synonyms problem
 To: solr-user@lucene.apache.org
 Date: Wednesday, September 7, 2011, 1:46 PM
 hello,
 
 I have some problems with synonyms. I will show some
 examples to descripe
 the problem:
 
 Data:
 
 High school Lissabon
 High school Barcelona
 University of applied science
 
 When a user search for IFD i want all the results back. So
 i want to use
 this synonyms at query time:
 
 IFD = high school lissabon, high school
 barcelona,University of applied
 science
 
 
 The data is stored in the field schools.
 
 Schools type looks like this:
 
    fieldType name=schools
 class=solr.TextField
 positionIncrementGap=100
       analyzer type=index
     charFilter
 class=solr.HTMLStripCharFilterFactory/
           tokenizer
 class=solr.PatternTokenizerFactory pattern=\s|,|- /
         filter
 class=solr.LowerCaseFilterFactory/
       /analyzer
       analyzer type=query
     charFilter
 class=solr.HTMLStripCharFilterFactory/
           tokenizer
 class=solr.PatternTokenizerFactory pattern=\s|,|- /
         filter
 class=solr.LowerCaseFilterFactory/
         filter
 class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=false/
       /analyzer
     /fieldType
 
 
 AS you can see i use some pattern tokenizer which splits on
 whitespace. When
 i use the synonyms at query time the
 analytics show me this:
 
 high           | school
 | lissabon    | science
 high           | school
 | barcelona  | 
 university   | of   
    | applied      |
 
 When i search for IFD i get no results. I found this in
 debugQuery:
 
 schools:(high high university) (school school of)
 (lissaban barcelona
 applied) (science)
 
 With this i see the problem: solr tries a lot of
 combinations but not the
 right one. I thought i could
 escape the whitespaces in the synonyms(High\ school\
 Lissabon). Then the
 analytics shows me better results:
 
 High school Lissabon
 High school Barcelona
 University of applied science
 
 Then SOLR search for high school Lissabon but in my index
 it is tokenized
 on whitespace, still no results.
 
 
 I'm stuck, can someone help me??
 
 Thanks
 R
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Synonyms-problem-tp3316287p3316287.html
 Sent from the Solr - User mailing list archive at
 Nabble.com.



Re: synonyms problem

2011-06-06 Thread Erick Erickson
What does call synonym methods in Java mean? That is, what are
you trying to accomplish and from where?

Best
Erick

On Sun, Jun 5, 2011 at 9:48 PM, deniz denizdurmu...@gmail.com wrote:
 well i have changed it into text... but still confused about how to use
 synonyms...

 and also I want to know how to call synonym methods in java... i have tried
 to use synonymmap and some other similar things but nothing happens...
 anyone can give me a sample or a website that i can find examples about solr
 in java?

 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/synonyms-problem-tp3014006p3028353.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: synonyms problem

2011-06-06 Thread deniz
well i was trying to say that; i have changed the config files for synonyms
and so on but nothing happens so i thought i needed to do something in java
code too... i was trying to ask about that...

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/synonyms-problem-tp3014006p3032666.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: synonyms problem

2011-06-05 Thread deniz
well i have changed it into text... but still confused about how to use
synonyms... 

and also I want to know how to call synonym methods in java... i have tried
to use synonymmap and some other similar things but nothing happens...
anyone can give me a sample or a website that i can find examples about solr
in java?

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/synonyms-problem-tp3014006p3028353.html
Sent from the Solr - User mailing list archive at Nabble.com.


synonyms problem

2011-06-02 Thread deniz
Hi all,

here is a piece from my solfconfig:   

 fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
  /analyzer
/fieldType


but somehow synonyms are not read... I mean there is no match when i use a
word in the synonym file... any ideas?

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/synonyms-problem-tp3014006p3014006.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: synonyms problem

2011-06-02 Thread Gora Mohanty
On Thu, Jun 2, 2011 at 11:58 AM, deniz denizdurmu...@gmail.com wrote:
 Hi all,

 here is a piece from my solfconfig:
[...]
 but somehow synonyms are not read... I mean there is no match when i use a
 word in the synonym file... any ideas?
[...]

Please provide further details, e.g., is your field in schema.xml using
this fieldType, one example line from the synonyms.txt file, how are
you searching, what results you expect to get, and what are the actual
results.

Also, while this is not the issue here, normally the fieldType
string is a non-analyzed field, and one would normally use
a different fieldType, e.g., text for data that are to be analyzed.

Regards,
Gora


Re: synonyms problem

2011-06-02 Thread lee carroll
Deniz,

it looks like you are missing an index anlayzer ? or have you removed
that for brevity ?

lee c

On 2 June 2011 10:41, Gora Mohanty g...@mimirtech.com wrote:
 On Thu, Jun 2, 2011 at 11:58 AM, deniz denizdurmu...@gmail.com wrote:
 Hi all,

 here is a piece from my solfconfig:
 [...]
 but somehow synonyms are not read... I mean there is no match when i use a
 word in the synonym file... any ideas?
 [...]

 Please provide further details, e.g., is your field in schema.xml using
 this fieldType, one example line from the synonyms.txt file, how are
 you searching, what results you expect to get, and what are the actual
 results.

 Also, while this is not the issue here, normally the fieldType
 string is a non-analyzed field, and one would normally use
 a different fieldType, e.g., text for data that are to be analyzed.

 Regards,
 Gora



Re: synonyms problem

2011-06-02 Thread lee carroll
oh and its a string field change this to be text if you need analysis

class=solr.StrField

lee c

On 2 June 2011 11:45, lee carroll lee.a.carr...@googlemail.com wrote:
 Deniz,

 it looks like you are missing an index anlayzer ? or have you removed
 that for brevity ?

 lee c

 On 2 June 2011 10:41, Gora Mohanty g...@mimirtech.com wrote:
 On Thu, Jun 2, 2011 at 11:58 AM, deniz denizdurmu...@gmail.com wrote:
 Hi all,

 here is a piece from my solfconfig:
 [...]
 but somehow synonyms are not read... I mean there is no match when i use a
 word in the synonym file... any ideas?
 [...]

 Please provide further details, e.g., is your field in schema.xml using
 this fieldType, one example line from the synonyms.txt file, how are
 you searching, what results you expect to get, and what are the actual
 results.

 Also, while this is not the issue here, normally the fieldType
 string is a non-analyzed field, and one would normally use
 a different fieldType, e.g., text for data that are to be analyzed.

 Regards,
 Gora




Re: synonyms problem

2011-06-02 Thread François Schiettecatte
Are you sure solr.StrField is the way to go with this? solr.StrField stores the 
entire text verbatim and I am pretty sure skips any analysis. Perhaps you 
should use solr.TextField instead.

François

On Jun 2, 2011, at 2:28 AM, deniz wrote:

 Hi all,
 
 here is a piece from my solfconfig:   
 
 fieldType name=string class=solr.StrField sortMissingLast=true
 omitNorms=true
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
  /analyzer
/fieldType
 
 
 but somehow synonyms are not read... I mean there is no match when i use a
 word in the synonym file... any ideas?
 
 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/synonyms-problem-tp3014006p3014006.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: synonyms problem

2011-06-02 Thread deniz
oh thank you for reminding me about string and text issues... I will change
it asap... and about index analyzer i just removed if for brevity... 

i will try again and if it fails will post here again...

thank you so much

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/synonyms-problem-tp3014006p3018185.html
Sent from the Solr - User mailing list archive at Nabble.com.


synonyms problem

2010-03-22 Thread michaelnazaruk

Hi all! I have a little problem with synonyms:
when I set my synonyms.txt file such as:
aberrant=abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical
it's all right! But if I set this file such as
aberrant,abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical
I get exception that not enough memory

-- 
View this message in context: 
http://old.nabble.com/synonyms-problem-tp27987378p27987378.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: synonyms problem

2010-03-22 Thread Armando Ota

Have you tried increasing memory size ?

we had some out of memory problems when we used default memory size ..

Kind regards

Armando

michaelnazaruk wrote:

Hi all! I have a little problem with synonyms:
when I set my synonyms.txt file such as:
aberrant=abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical
it's all right! But if I set this file such as
aberrant,abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical
I get exception that not enough memory

  


Re: synonyms problem

2010-03-22 Thread Lance Norskog
How large is the document, and how often does 'aberrant' appear in it?
Are the other words also in the document?

What is the full analysis stack? There might be interactions between
the SynonymFilter and other filters.

What does the admin/analysis.jsp page show? Does it throw OutOfMemory also?

Does stemming turn two of the terms into the same term?

On Mon, Mar 22, 2010 at 7:48 AM, Armando Ota armando...@siol.net wrote:
 Have you tried increasing memory size ?

 we had some out of memory problems when we used default memory size ..

 Kind regards

 Armando

 michaelnazaruk wrote:

 Hi all! I have a little problem with synonyms:
 when I set my synonyms.txt file such as:

 aberrant=abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical
 it's all right! But if I set this file such as

 aberrant,abnormal,unusual,deviant,anomalous,peculiar,uncharacteristic,irregular,atypical
 I get exception that not enough memory






-- 
Lance Norskog
goks...@gmail.com