RE: Multi word synonym problem

2009-11-23 Thread Chris Hostetter

: The response is not searching for Michael Jackson. Instead it is 
: searching for (text:Micheal and text: Jackson).To monitor the parsed 
: query, i turned on debugQuery, but in the present case, the parsed query 
: string was searching Micheal and Jackson separately.

using index time synonyms isn't ggoing to have any effect on how your 
query is parsed.  the Lucene/Solr query parsers uses whitespace as 
markup and will still analyze each of the words in your input 
seperately and build up a boolean query containing each of your words 
individually (the only way to change that is to use quotes to force 
phrase query behavior where everything in quotes is analyzed as one 
chunk, or pick a different queyr parse like the field parser)

...but none of that changes the point of *why* you can/should use index 
time synonyms for situations like this.  the point of doing that is that 
at index time the alternate versions of the multi-word sequences can all 
be expanded and all varients are put in the index ... so it doesn't matter 
if you use a phrase query, or term queries, all of the synonyms are in the 
index document.



-Hoss



RE: Multi word synonym problem

2009-11-20 Thread Nair, Manas
Hi,
 
I tried using the recommended approach but to no benefit. The multiword 
synonyms are still not appearing in the result.
 
My schema.xml has the following fieldType:
 
 
fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
!--filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/ --
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=false/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
!--filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/ --
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

This text field is the defaultSearchField too.
 
If I give the synonym for Micheal Jackson as Michael Jackson, i.e. in my 
synonyms.txt file, he entry is:
Micheal Jackson = Michael Jackson
 
The response is not searching for Michael Jackson. Instead it is searching for 
(text:Micheal and text: Jackson).To monitor the parsed query, i turned on 
debugQuery, but in the present case, the parsed query string was searching 
Micheal and Jackson separately.
 
I was able to somehow bring the corret response by modifying the synonyms.txt 
file. I changed the entry as:
Micheal Jackson , Michael Jackson  (replaced '=' with ',').
 
Is there something that needs to be done with the schema part that has been 
mentioned above. I would want the synonyms to work when I map them using =.
 
Kindly help.
 
Thankyou,
Manas


From: AHMET ARSLAN [mailto:iori...@yahoo.com]
Sent: Thu 11/12/2009 1:18 PM
To: solr-user@lucene.apache.org
Subject: Re: Multi word synonym problem



It is recommended [1] to use synonyms at index time only for various reasons 
especially with multi-word synonyms.

[1]http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

only at index time use expand=true ingoreCase=true with synonym.txt :

micheal, michael

OR:

micheal jackson, michael jackson

Note it it is important to what filters you have before synonym filter.
Bu sure that you restart tomcat and re-index.

Query Micheal Jackson (not phrase search) should return the results
for Michael Jackson.

Hope this helps.

--- On Thu, 11/12/09, Nair, Manas manas.n...@mtvnmix.com wrote:

 From: Nair, Manas manas.n...@mtvnmix.com
 Subject: Multi word synonym problem
 To: solr-user@lucene.apache.org
 Cc: Arumugam, Senthil Kumar senthilkumar.arumu...@mtvncontractor.com
 Date: Thursday, November 12, 2009, 3:43 PM
 Hi Experts,
 
 I would like help on multi word synonyms. The scenario is
 like:
 
 I have a name Micheal Jackson(wrong term) which has a
 synonym Michael Jackson i.e.
 
 Micheal Jackson = Michael Jackson
 
 When I try to search for the word Micheal Jackson (not a
 phrase search), it is searching for text: Micheal , text:
 Jackson  and not for Michael Jackson.
 But when I search for Micheal Jackson (phrase search),
 solr is searching for Michael Jackson (the correct term).
 
 The schema.xml for the particular core contains the 
 SynonymFilterFactory for text analyzer and is enabled during
 index as well as query time. The  SynonymFilterFactory
 during index and query time has the parameter expand=true.
 
 Please help me as to how a multiword synonym can be made
 effective i.e I want a search for
 Micheal Jackson (not phrase search) to return the results
 for Michael Jackson.
 
 What should be done so that Micheal Jackson is considered
 as one search term instead of splitting it.
 
 Any help is greatly appreciated.
 
 Thankyou,
 Manas Nair



 




Multi word synonym problem

2009-11-12 Thread Nair, Manas
Hi Experts,
 
I would like help on multi word synonyms. The scenario is like:
 
I have a name Micheal Jackson(wrong term) which has a synonym Michael Jackson 
i.e.
 
Micheal Jackson = Michael Jackson
 
When I try to search for the word Micheal Jackson (not a phrase search), it is 
searching for text: Micheal , text: Jackson  and not for Michael Jackson.
But when I search for Micheal Jackson (phrase search), solr is searching for 
Michael Jackson (the correct term).
 
The schema.xml for the particular core contains the  SynonymFilterFactory for 
text analyzer and is enabled during index as well as query time. The  
SynonymFilterFactory during index and query time has the parameter expand=true.
 
Please help me as to how a multiword synonym can be made effective i.e I want a 
search for 
Micheal Jackson (not phrase search) to return the results for Michael Jackson.
 
What should be done so that Micheal Jackson is considered as one search term 
instead of splitting it.
 
Any help is greatly appreciated.
 
Thankyou,
Manas Nair


Re: Multi word synonym problem

2009-11-12 Thread AHMET ARSLAN
It is recommended [1] to use synonyms at index time only for various reasons 
especially with multi-word synonyms.

[1]http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

only at index time use expand=true ingoreCase=true with synonym.txt :

micheal, michael

OR:

micheal jackson, michael jackson

Note it it is important to what filters you have before synonym filter.
Bu sure that you restart tomcat and re-index.

Query Micheal Jackson (not phrase search) should return the results
for Michael Jackson.

Hope this helps.

--- On Thu, 11/12/09, Nair, Manas manas.n...@mtvnmix.com wrote:

 From: Nair, Manas manas.n...@mtvnmix.com
 Subject: Multi word synonym problem
 To: solr-user@lucene.apache.org
 Cc: Arumugam, Senthil Kumar senthilkumar.arumu...@mtvncontractor.com
 Date: Thursday, November 12, 2009, 3:43 PM
 Hi Experts,
  
 I would like help on multi word synonyms. The scenario is
 like:
  
 I have a name Micheal Jackson(wrong term) which has a
 synonym Michael Jackson i.e.
  
 Micheal Jackson = Michael Jackson
  
 When I try to search for the word Micheal Jackson (not a
 phrase search), it is searching for text: Micheal , text:
 Jackson  and not for Michael Jackson.
 But when I search for Micheal Jackson (phrase search),
 solr is searching for Michael Jackson (the correct term).
  
 The schema.xml for the particular core contains the 
 SynonymFilterFactory for text analyzer and is enabled during
 index as well as query time. The  SynonymFilterFactory
 during index and query time has the parameter expand=true.
  
 Please help me as to how a multiword synonym can be made
 effective i.e I want a search for 
 Micheal Jackson (not phrase search) to return the results
 for Michael Jackson.
  
 What should be done so that Micheal Jackson is considered
 as one search term instead of splitting it.
  
 Any help is greatly appreciated.
  
 Thankyou,
 Manas Nair