Re: Can a field with defined synonym be searched without the synonym?

2012-12-13 Thread Walter Underwood
Perhaps you could use two indexed fields, one with synonym expansion and one 
without.

wunder

On Dec 12, 2012, at 11:33 PM, Burgmans, Tom wrote:

 In our case it's the opposite. For our clients it is very important that 
 every synonym gets equal chances in the relevancy calculation. The fact that 
 nol scores higher than net operating loss, simply because its document 
 frequency is lower, is unacceptable and a reason to look for ways to disable 
 the IDF from the score calculation. But that is in fact something I don't 
 like to do since IDF is such an elementary part of the algorithm (and very 
 useful for non-synonym searches).
 
 Pre-processing synonyms to apply 'reverse weighting' is also a strategy to 
 consider but I agree with Walter that this very error-prone, things could get 
 easily out of sync. Moreover, none of our Dev-, QA-, STG-, PRD- environment 
 contain exactly the same content, so it would require different tuned 
 synonyms dictionary for each of them...meh...
 
 In our previous search engine (FAST ESP) we basically switched off IDF, but I 
 am still a bit hoping that there is a more sophisticated solution with Solr.
 
 
 -Original Message-
 From: Walter Underwood [mailto:wun...@wunderwood.org]
 Sent: Thursday 13 December 2012 02:30
 To: solr-user@lucene.apache.org
 Subject: Re: Can a field with defined synonym be searched without the synonym?
 
 All of the applications I've seen with user control over synonym expansion 
 where recall-oriented. The give me all matches for X kind of problem. So 
 ranking is not as important.
 
 wunder
 
 On Dec 12, 2012, at 5:23 PM, Roman Chyla wrote:
 
 Well, this IDF problem has more sides. So, let's say your synonym file
 contains multi-token synonyms (it does, right? or perhaps you don't need
 it? well, some people do)
 
 TV, TV set, TV foo, television
 
 if you use the default synonym expansion, when you index 'television'
 
 you have increased frequency of also 'set', 'foo', so, the IDF of 'TV' is
 the same as that of 'television' - but IDF of 'foo' and 'set' has changed
 (their frequency increased, their IDF decreased) -- TV's have in fact made
 'foo' term very frequent and undesirable
 
 So, you might be sure that IDF of 'TV' and 'television' are the same, but
 you are not aware it has 'screwed' other (desirable) terms - so it really
 depends. And I wouldn't argue these cases are esoteric.
 
 And finally: there are use cases out there, where people NEED to switch off
 synonym expansion at will (find only these documents, that contain the word
 'TV' and not that bloody 'foo'). This cannot be done if the index contains
 all synonym terms (unless you have a way to mark the original and the
 synonym in the index).
 
 roman
 
 
 On Wed, Dec 12, 2012 at 12:50 PM, Walter Underwood 
 wun...@wunderwood.orgwrote:
 
 Query parsers cannot fix the IDF problem or make query-time synonyms
 faster. Query synonym expansion makes more search terms. More search terms
 are more work at query time.
 
 The IDF problem is real; I've run up against it. The most rare variant of
 the synonym have the highest score. This probably the opposite of what you
 want. For me, it was TV and television. Documents with TV had higher
 scores than those with television.
 
 wunder
 
 On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:
 
 @wunder
 It is a misconception (well, supported by that wiki description) that the
 query time synonym filter have these problems. It is actually the default
 parser, that is causing these problems. Look at this if you still think
 that index time synonyms are cure for all:
 https://issues.apache.org/jira/browse/LUCENE-4499
 
 @joe
 If you can use the flexible query parser (as linked in by @Swati) then
 all
 you need to do is to define a different field with a different tokenizer
 chain and then swap the field names before the analyzers processes the
 document (and then rewrite the field name back - for example, we have
 fields called author and author_nosyn)
 
 roman
 
 On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood 
 wun...@wunderwood.orgwrote:
 
 Query time synonyms have known problems. They are slower, cause
 incorrect
 IDF, and don't work for phrase synonyms.
 
 Apply synonyms at index time and you will have none of those problems.
 
 See:
 
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
 
 wunder
 
 On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
 
 Query-time analyzers are still applied, even if you include a string in
 quotes. Would you expect foo to not match Foo just because it's
 enclosed in quotes?
 
 Also look at this, someone who had similar requirements:
 
 
 http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
 
 
 -Original Message-
 From: joe.cohe...@gmail.com [mailto:joe.cohe...@gmail.com]
 Sent: Wednesday, December 12, 2012 12:09 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Can a field with defined synonym be searched without

Re: Can a field with defined synonym be searched without the synonym?

2012-12-12 Thread Upayavira
You can only search against terms that are stored in your index. If you
have applied index time synonyms, you can't remove them at query time.

You can, however, use copyField to clone an incoming field to another
field that doesn't use synonyms, and search against that field instead.

Upayavira

On Wed, Dec 12, 2012, at 04:26 PM, joe.cohe...@gmail.com wrote:
 Hi
 I hava a field type without defined synonym.txt which retrieves both 
 records with home and house when I search either one of them.
 
 I want to be able to search this field on the specific value that I
 enter,
 without the synonym filter.
 
 is it possible?
 
 thanks.
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381.html
 Sent from the Solr - User mailing list archive at Nabble.com.


RE: Can a field with defined synonym be searched without the synonym?

2012-12-12 Thread Swati Swoboda
Query-time analyzers are still applied, even if you include a string in quotes. 
Would you expect foo to not match Foo just because it's enclosed in quotes?

Also look at this, someone who had similar requirements:
http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html


-Original Message-
From: joe.cohe...@gmail.com [mailto:joe.cohe...@gmail.com] 
Sent: Wednesday, December 12, 2012 12:09 PM
To: solr-user@lucene.apache.org
Subject: Re: Can a field with defined synonym be searched without the synonym?


I'm aplying only query-time synonym, so I have the original values stored and 
indexed.
I would've expected that if I search a strin with quotations, i'll get the 
exact match, without applying a synonym.

any way to achieve that?


Upayavira wrote
 You can only search against terms that are stored in your index. If 
 you have applied index time synonyms, you can't remove them at query time.
 
 You can, however, use copyField to clone an incoming field to another 
 field that doesn't use synonyms, and search against that field instead.
 
 Upayavira
 
 On Wed, Dec 12, 2012, at 04:26 PM,

 joe.cohen.m@

  wrote:
 Hi
 I hava a field type without defined synonym.txt which retrieves both 
 records with home and house when I search either one of them.
 
 I want to be able to search this field on the specific value that I 
 enter, without the synonym filter.
 
 is it possible?
 
 thanks.
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
 e-searched-without-the-synonym-tp4026381.html
 Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can a field with defined synonym be searched without the synonym?

2012-12-12 Thread Walter Underwood
Query time synonyms have known problems. They are slower, cause incorrect IDF, 
and don't work for phrase synonyms.

Apply synonyms at index time and you will have none of those problems.

See: 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

wunder

On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:

 Query-time analyzers are still applied, even if you include a string in 
 quotes. Would you expect foo to not match Foo just because it's enclosed 
 in quotes?
 
 Also look at this, someone who had similar requirements:
 http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
 
 
 -Original Message-
 From: joe.cohe...@gmail.com [mailto:joe.cohe...@gmail.com] 
 Sent: Wednesday, December 12, 2012 12:09 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Can a field with defined synonym be searched without the synonym?
 
 
 I'm aplying only query-time synonym, so I have the original values stored and 
 indexed.
 I would've expected that if I search a strin with quotations, i'll get the 
 exact match, without applying a synonym.
 
 any way to achieve that?
 
 
 Upayavira wrote
 You can only search against terms that are stored in your index. If 
 you have applied index time synonyms, you can't remove them at query time.
 
 You can, however, use copyField to clone an incoming field to another 
 field that doesn't use synonyms, and search against that field instead.
 
 Upayavira
 
 On Wed, Dec 12, 2012, at 04:26 PM,
 
 joe.cohen.m@
 
 wrote:
 Hi
 I hava a field type without defined synonym.txt which retrieves both 
 records with home and house when I search either one of them.
 
 I want to be able to search this field on the specific value that I 
 enter, without the synonym filter.
 
 is it possible?
 
 thanks.
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
 e-searched-without-the-synonym-tp4026381.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
 Sent from the Solr - User mailing list archive at Nabble.com.

--
Walter Underwood
wun...@wunderwood.org





Re: Can a field with defined synonym be searched without the synonym?

2012-12-12 Thread Roman Chyla
@wunder
It is a misconception (well, supported by that wiki description) that the
query time synonym filter have these problems. It is actually the default
parser, that is causing these problems. Look at this if you still think
that index time synonyms are cure for all:
https://issues.apache.org/jira/browse/LUCENE-4499

@joe
If you can use the flexible query parser (as linked in by @Swati) then all
you need to do is to define a different field with a different tokenizer
chain and then swap the field names before the analyzers processes the
document (and then rewrite the field name back - for example, we have
fields called author and author_nosyn)

roman

On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood wun...@wunderwood.orgwrote:

 Query time synonyms have known problems. They are slower, cause incorrect
 IDF, and don't work for phrase synonyms.

 Apply synonyms at index time and you will have none of those problems.

 See:
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

 wunder

 On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:

  Query-time analyzers are still applied, even if you include a string in
 quotes. Would you expect foo to not match Foo just because it's
 enclosed in quotes?
 
  Also look at this, someone who had similar requirements:
 
 http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
 
 
  -Original Message-
  From: joe.cohe...@gmail.com [mailto:joe.cohe...@gmail.com]
  Sent: Wednesday, December 12, 2012 12:09 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Can a field with defined synonym be searched without the
 synonym?
 
 
  I'm aplying only query-time synonym, so I have the original values
 stored and indexed.
  I would've expected that if I search a strin with quotations, i'll get
 the exact match, without applying a synonym.
 
  any way to achieve that?
 
 
  Upayavira wrote
  You can only search against terms that are stored in your index. If
  you have applied index time synonyms, you can't remove them at query
 time.
 
  You can, however, use copyField to clone an incoming field to another
  field that doesn't use synonyms, and search against that field instead.
 
  Upayavira
 
  On Wed, Dec 12, 2012, at 04:26 PM,
 
  joe.cohen.m@
 
  wrote:
  Hi
  I hava a field type without defined synonym.txt which retrieves both
  records with home and house when I search either one of them.
 
  I want to be able to search this field on the specific value that I
  enter, without the synonym filter.
 
  is it possible?
 
  thanks.
 
 
 
  --
  View this message in context:
  http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
  e-searched-without-the-synonym-tp4026381.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
  Sent from the Solr - User mailing list archive at Nabble.com.

 --
 Walter Underwood
 wun...@wunderwood.org






Re: Can a field with defined synonym be searched without the synonym?

2012-12-12 Thread Walter Underwood
Query parsers cannot fix the IDF problem or make query-time synonyms faster. 
Query synonym expansion makes more search terms. More search terms are more 
work at query time.

The IDF problem is real; I've run up against it. The most rare variant of the 
synonym have the highest score. This probably the opposite of what you want. 
For me, it was TV and television. Documents with TV had higher scores 
than those with television. 

wunder

On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:

 @wunder
 It is a misconception (well, supported by that wiki description) that the
 query time synonym filter have these problems. It is actually the default
 parser, that is causing these problems. Look at this if you still think
 that index time synonyms are cure for all:
 https://issues.apache.org/jira/browse/LUCENE-4499
 
 @joe
 If you can use the flexible query parser (as linked in by @Swati) then all
 you need to do is to define a different field with a different tokenizer
 chain and then swap the field names before the analyzers processes the
 document (and then rewrite the field name back - for example, we have
 fields called author and author_nosyn)
 
 roman
 
 On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood 
 wun...@wunderwood.orgwrote:
 
 Query time synonyms have known problems. They are slower, cause incorrect
 IDF, and don't work for phrase synonyms.
 
 Apply synonyms at index time and you will have none of those problems.
 
 See:
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
 
 wunder
 
 On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
 
 Query-time analyzers are still applied, even if you include a string in
 quotes. Would you expect foo to not match Foo just because it's
 enclosed in quotes?
 
 Also look at this, someone who had similar requirements:
 
 http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
 
 
 -Original Message-
 From: joe.cohe...@gmail.com [mailto:joe.cohe...@gmail.com]
 Sent: Wednesday, December 12, 2012 12:09 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Can a field with defined synonym be searched without the
 synonym?
 
 
 I'm aplying only query-time synonym, so I have the original values
 stored and indexed.
 I would've expected that if I search a strin with quotations, i'll get
 the exact match, without applying a synonym.
 
 any way to achieve that?
 
 
 Upayavira wrote
 You can only search against terms that are stored in your index. If
 you have applied index time synonyms, you can't remove them at query
 time.
 
 You can, however, use copyField to clone an incoming field to another
 field that doesn't use synonyms, and search against that field instead.
 
 Upayavira
 
 On Wed, Dec 12, 2012, at 04:26 PM,
 
 joe.cohen.m@
 
 wrote:
 Hi
 I hava a field type without defined synonym.txt which retrieves both
 records with home and house when I search either one of them.
 
 I want to be able to search this field on the specific value that I
 enter, without the synonym filter.
 
 is it possible?
 
 thanks.
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
 e-searched-without-the-synonym-tp4026381.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 --
 Walter Underwood
 wun...@wunderwood.org
 
 
 
 

--
Walter Underwood
wun...@wunderwood.org





Re: Can a field with defined synonym be searched without the synonym?

2012-12-12 Thread Steve Rowe
But couldn't the IDF problem be fixed by applying the same IDF to all synonyms, 
e.g. via DisjunctionMaxQuery?  (Maybe the ideal would be an average, not a max.)

(E)dismax applies this query per-field, but AFAICT there is nothing stopping 
anybody (modulo query parser construction :) ) from using it on synonyms in the 
same field.

Steve

On Dec 12, 2012, at 12:50 PM, Walter Underwood wun...@wunderwood.org wrote:

 Query parsers cannot fix the IDF problem or make query-time synonyms faster. 
 Query synonym expansion makes more search terms. More search terms are more 
 work at query time.
 
 The IDF problem is real; I've run up against it. The most rare variant of the 
 synonym have the highest score. This probably the opposite of what you want. 
 For me, it was TV and television. Documents with TV had higher scores 
 than those with television. 
 
 wunder
 
 On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:
 
 @wunder
 It is a misconception (well, supported by that wiki description) that the
 query time synonym filter have these problems. It is actually the default
 parser, that is causing these problems. Look at this if you still think
 that index time synonyms are cure for all:
 https://issues.apache.org/jira/browse/LUCENE-4499
 
 @joe
 If you can use the flexible query parser (as linked in by @Swati) then all
 you need to do is to define a different field with a different tokenizer
 chain and then swap the field names before the analyzers processes the
 document (and then rewrite the field name back - for example, we have
 fields called author and author_nosyn)
 
 roman
 
 On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood 
 wun...@wunderwood.orgwrote:
 
 Query time synonyms have known problems. They are slower, cause incorrect
 IDF, and don't work for phrase synonyms.
 
 Apply synonyms at index time and you will have none of those problems.
 
 See:
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
 
 wunder
 
 On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
 
 Query-time analyzers are still applied, even if you include a string in
 quotes. Would you expect foo to not match Foo just because it's
 enclosed in quotes?
 
 Also look at this, someone who had similar requirements:
 
 http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
 
 
 -Original Message-
 From: joe.cohe...@gmail.com [mailto:joe.cohe...@gmail.com]
 Sent: Wednesday, December 12, 2012 12:09 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Can a field with defined synonym be searched without the
 synonym?
 
 
 I'm aplying only query-time synonym, so I have the original values
 stored and indexed.
 I would've expected that if I search a strin with quotations, i'll get
 the exact match, without applying a synonym.
 
 any way to achieve that?
 
 
 Upayavira wrote
 You can only search against terms that are stored in your index. If
 you have applied index time synonyms, you can't remove them at query
 time.
 
 You can, however, use copyField to clone an incoming field to another
 field that doesn't use synonyms, and search against that field instead.
 
 Upayavira
 
 On Wed, Dec 12, 2012, at 04:26 PM,
 
 joe.cohen.m@
 
 wrote:
 Hi
 I hava a field type without defined synonym.txt which retrieves both
 records with home and house when I search either one of them.
 
 I want to be able to search this field on the specific value that I
 enter, without the synonym filter.
 
 is it possible?
 
 thanks.
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
 e-searched-without-the-synonym-tp4026381.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 --
 Walter Underwood
 wun...@wunderwood.org
 
 
 
 
 
 --
 Walter Underwood
 wun...@wunderwood.org
 
 
 



Re: Can a field with defined synonym be searched without the synonym?

2012-12-12 Thread Steve Rowe
Hmm, I've gotten this very wrong :) - DisjunctionMaxQuery will operate per-doc, 
so using it in the way I suggested will not allow for synonym IDF leveling 
across documents.  Also, scoring obviously includes more factors than IDF.

On Dec 12, 2012, at 5:18 PM, Steve Rowe sar...@gmail.com wrote:

 But couldn't the IDF problem be fixed by applying the same IDF to all 
 synonyms, e.g. via DisjunctionMaxQuery?  (Maybe the ideal would be an 
 average, not a max.)
 
 (E)dismax applies this query per-field, but AFAICT there is nothing stopping 
 anybody (modulo query parser construction :) ) from using it on synonyms in 
 the same field.
 
 Steve
 
 On Dec 12, 2012, at 12:50 PM, Walter Underwood wun...@wunderwood.org wrote:
 
 Query parsers cannot fix the IDF problem or make query-time synonyms faster. 
 Query synonym expansion makes more search terms. More search terms are more 
 work at query time.
 
 The IDF problem is real; I've run up against it. The most rare variant of 
 the synonym have the highest score. This probably the opposite of what you 
 want. For me, it was TV and television. Documents with TV had higher 
 scores than those with television. 
 
 wunder
 
 On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:
 
 @wunder
 It is a misconception (well, supported by that wiki description) that the
 query time synonym filter have these problems. It is actually the default
 parser, that is causing these problems. Look at this if you still think
 that index time synonyms are cure for all:
 https://issues.apache.org/jira/browse/LUCENE-4499
 
 @joe
 If you can use the flexible query parser (as linked in by @Swati) then all
 you need to do is to define a different field with a different tokenizer
 chain and then swap the field names before the analyzers processes the
 document (and then rewrite the field name back - for example, we have
 fields called author and author_nosyn)
 
 roman
 
 On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood 
 wun...@wunderwood.orgwrote:
 
 Query time synonyms have known problems. They are slower, cause incorrect
 IDF, and don't work for phrase synonyms.
 
 Apply synonyms at index time and you will have none of those problems.
 
 See:
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
 
 wunder
 
 On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
 
 Query-time analyzers are still applied, even if you include a string in
 quotes. Would you expect foo to not match Foo just because it's
 enclosed in quotes?
 
 Also look at this, someone who had similar requirements:
 
 http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
 
 
 -Original Message-
 From: joe.cohe...@gmail.com [mailto:joe.cohe...@gmail.com]
 Sent: Wednesday, December 12, 2012 12:09 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Can a field with defined synonym be searched without the
 synonym?
 
 
 I'm aplying only query-time synonym, so I have the original values
 stored and indexed.
 I would've expected that if I search a strin with quotations, i'll get
 the exact match, without applying a synonym.
 
 any way to achieve that?
 
 
 Upayavira wrote
 You can only search against terms that are stored in your index. If
 you have applied index time synonyms, you can't remove them at query
 time.
 
 You can, however, use copyField to clone an incoming field to another
 field that doesn't use synonyms, and search against that field instead.
 
 Upayavira
 
 On Wed, Dec 12, 2012, at 04:26 PM,
 
 joe.cohen.m@
 
 wrote:
 Hi
 I hava a field type without defined synonym.txt which retrieves both
 records with home and house when I search either one of them.
 
 I want to be able to search this field on the specific value that I
 enter, without the synonym filter.
 
 is it possible?
 
 thanks.
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
 e-searched-without-the-synonym-tp4026381.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 --
 Walter Underwood
 wun...@wunderwood.org
 
 
 
 
 
 --
 Walter Underwood
 wun...@wunderwood.org
 
 
 
 



Re: Can a field with defined synonym be searched without the synonym?

2012-12-12 Thread Jack Krupansky
Sure, synonyms have lots of issues and choosing index vs. query is simply 
picking your poison, but it all depends on your app and your data and your 
user expectations, and you, the developer, have tools to moderate a lot of 
these issues.


Index-time synonyms have the problem (among others) that they cannot be 
changed without reindexing.


One technique is to simulate the query-time synonym filter expansion by 
having your app preprocess user queries to expand to the OR of the synonyms 
and then boost or de-boost the synonyms as makes sense for your app.


For example,

   (tv^0.5 OR television^2.5 OR boob tube^0.0001)

-- Jack Krupansky

-Original Message- 
From: Steve Rowe

Sent: Wednesday, December 12, 2012 5:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Can a field with defined synonym be searched without the 
synonym?


Hmm, I've gotten this very wrong :) - DisjunctionMaxQuery will operate 
per-doc, so using it in the way I suggested will not allow for synonym IDF 
leveling across documents.  Also, scoring obviously includes more factors 
than IDF.


On Dec 12, 2012, at 5:18 PM, Steve Rowe sar...@gmail.com wrote:

But couldn't the IDF problem be fixed by applying the same IDF to all 
synonyms, e.g. via DisjunctionMaxQuery?  (Maybe the ideal would be an 
average, not a max.)


(E)dismax applies this query per-field, but AFAICT there is nothing 
stopping anybody (modulo query parser construction :) ) from using it on 
synonyms in the same field.


Steve

On Dec 12, 2012, at 12:50 PM, Walter Underwood wun...@wunderwood.org 
wrote:


Query parsers cannot fix the IDF problem or make query-time synonyms 
faster. Query synonym expansion makes more search terms. More search 
terms are more work at query time.


The IDF problem is real; I've run up against it. The most rare variant of 
the synonym have the highest score. This probably the opposite of what 
you want. For me, it was TV and television. Documents with TV had 
higher scores than those with television.


wunder

On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:


@wunder
It is a misconception (well, supported by that wiki description) that 
the
query time synonym filter have these problems. It is actually the 
default

parser, that is causing these problems. Look at this if you still think
that index time synonyms are cure for all:
https://issues.apache.org/jira/browse/LUCENE-4499

@joe
If you can use the flexible query parser (as linked in by @Swati) then 
all

you need to do is to define a different field with a different tokenizer
chain and then swap the field names before the analyzers processes the
document (and then rewrite the field name back - for example, we have
fields called author and author_nosyn)

roman

On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood 
wun...@wunderwood.orgwrote:


Query time synonyms have known problems. They are slower, cause 
incorrect

IDF, and don't work for phrase synonyms.

Apply synonyms at index time and you will have none of those problems.

See:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

wunder

On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:

Query-time analyzers are still applied, even if you include a string 
in

quotes. Would you expect foo to not match Foo just because it's
enclosed in quotes?


Also look at this, someone who had similar requirements:


http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html



-Original Message-
From: joe.cohe...@gmail.com [mailto:joe.cohe...@gmail.com]
Sent: Wednesday, December 12, 2012 12:09 PM
To: solr-user@lucene.apache.org
Subject: Re: Can a field with defined synonym be searched without the

synonym?



I'm aplying only query-time synonym, so I have the original values

stored and indexed.

I would've expected that if I search a strin with quotations, i'll get

the exact match, without applying a synonym.


any way to achieve that?


Upayavira wrote

You can only search against terms that are stored in your index. If
you have applied index time synonyms, you can't remove them at query

time.


You can, however, use copyField to clone an incoming field to another
field that doesn't use synonyms, and search against that field 
instead.


Upayavira

On Wed, Dec 12, 2012, at 04:26 PM,



joe.cohen.m@



wrote:

Hi
I hava a field type without defined synonym.txt which retrieves both
records with home and house when I search either one of them.

I want to be able to search this field on the specific value that I
enter, without the synonym filter.

is it possible?

thanks.



--
View this message in context:
http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
e-searched-without-the-synonym-tp4026381.html
Sent from the Solr - User mailing list archive at Nabble.com.






--
View this message in context:

http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html

Sent from the Solr

Re: Can a field with defined synonym be searched without the synonym?

2012-12-12 Thread Walter Underwood
If you have tons of content, you can do selective reindexing. You only need to 
reindex the docs containing the the new terms. If I add a synonym for 
babysitter and baby sitter, then I can do a search for documents containing 
either of those, and only reindex those.

Reverse weighting to even out the IDF would work, but it could be pretty 
tweaky. If one synonym is very rare, you put in small weight, but then you 
index several documents with that term and the it is overweighted. 

wunder

On Dec 12, 2012, at 4:09 PM, Jack Krupansky wrote:

 Sure, synonyms have lots of issues and choosing index vs. query is simply 
 picking your poison, but it all depends on your app and your data and your 
 user expectations, and you, the developer, have tools to moderate a lot of 
 these issues.
 
 Index-time synonyms have the problem (among others) that they cannot be 
 changed without reindexing.
 
 One technique is to simulate the query-time synonym filter expansion by 
 having your app preprocess user queries to expand to the OR of the synonyms 
 and then boost or de-boost the synonyms as makes sense for your app.
 
 For example,
 
   (tv^0.5 OR television^2.5 OR boob tube^0.0001)
 
 -- Jack Krupansky
 
 -Original Message- From: Steve Rowe
 Sent: Wednesday, December 12, 2012 5:28 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Can a field with defined synonym be searched without the synonym?
 
 Hmm, I've gotten this very wrong :) - DisjunctionMaxQuery will operate 
 per-doc, so using it in the way I suggested will not allow for synonym IDF 
 leveling across documents.  Also, scoring obviously includes more factors 
 than IDF.
 
 On Dec 12, 2012, at 5:18 PM, Steve Rowe sar...@gmail.com wrote:
 
 But couldn't the IDF problem be fixed by applying the same IDF to all 
 synonyms, e.g. via DisjunctionMaxQuery?  (Maybe the ideal would be an 
 average, not a max.)
 
 (E)dismax applies this query per-field, but AFAICT there is nothing stopping 
 anybody (modulo query parser construction :) ) from using it on synonyms in 
 the same field.
 
 Steve
 
 On Dec 12, 2012, at 12:50 PM, Walter Underwood wun...@wunderwood.org wrote:
 
 Query parsers cannot fix the IDF problem or make query-time synonyms 
 faster. Query synonym expansion makes more search terms. More search terms 
 are more work at query time.
 
 The IDF problem is real; I've run up against it. The most rare variant of 
 the synonym have the highest score. This probably the opposite of what you 
 want. For me, it was TV and television. Documents with TV had higher 
 scores than those with television.
 
 wunder
 
 On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:
 
 @wunder
 It is a misconception (well, supported by that wiki description) that the
 query time synonym filter have these problems. It is actually the default
 parser, that is causing these problems. Look at this if you still think
 that index time synonyms are cure for all:
 https://issues.apache.org/jira/browse/LUCENE-4499
 
 @joe
 If you can use the flexible query parser (as linked in by @Swati) then all
 you need to do is to define a different field with a different tokenizer
 chain and then swap the field names before the analyzers processes the
 document (and then rewrite the field name back - for example, we have
 fields called author and author_nosyn)
 
 roman
 
 On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood 
 wun...@wunderwood.orgwrote:
 
 Query time synonyms have known problems. They are slower, cause incorrect
 IDF, and don't work for phrase synonyms.
 
 Apply synonyms at index time and you will have none of those problems.
 
 See:
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
 
 wunder
 
 On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
 
 Query-time analyzers are still applied, even if you include a string in
 quotes. Would you expect foo to not match Foo just because it's
 enclosed in quotes?
 
 Also look at this, someone who had similar requirements:
 
 http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
 
 
 -Original Message-
 From: joe.cohe...@gmail.com [mailto:joe.cohe...@gmail.com]
 Sent: Wednesday, December 12, 2012 12:09 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Can a field with defined synonym be searched without the
 synonym?
 
 
 I'm aplying only query-time synonym, so I have the original values
 stored and indexed.
 I would've expected that if I search a strin with quotations, i'll get
 the exact match, without applying a synonym.
 
 any way to achieve that?
 
 
 Upayavira wrote
 You can only search against terms that are stored in your index. If
 you have applied index time synonyms, you can't remove them at query
 time.
 
 You can, however, use copyField to clone an incoming field to another
 field that doesn't use synonyms, and search against that field instead.
 
 Upayavira
 
 On Wed, Dec 12, 2012, at 04:26 PM,
 
 joe.cohen.m@
 
 wrote:
 Hi
 I hava a field type without

Re: Can a field with defined synonym be searched without the synonym?

2012-12-12 Thread Jack Krupansky
Another great use case for synonyms is misspellings. I saw one synonym list 
in which the top synonym was the phrase dead mouse (which doesn't look 
misspelled at all); I won't tell you what it's proper synonym was, other 
than to say that it was VERY app/culture-dependent. It was also interesting 
because the user's original query phrase needed to be given a much lower 
weighting in order to find what the user was likely looking for.


-- Jack Krupansky

-Original Message- 
From: Walter Underwood

Sent: Wednesday, December 12, 2012 7:16 PM
To: solr-user@lucene.apache.org
Subject: Re: Can a field with defined synonym be searched without the 
synonym?


If you have tons of content, you can do selective reindexing. You only need 
to reindex the docs containing the the new terms. If I add a synonym for 
babysitter and baby sitter, then I can do a search for documents 
containing either of those, and only reindex those.


Reverse weighting to even out the IDF would work, but it could be pretty 
tweaky. If one synonym is very rare, you put in small weight, but then you 
index several documents with that term and the it is overweighted.


wunder

On Dec 12, 2012, at 4:09 PM, Jack Krupansky wrote:

Sure, synonyms have lots of issues and choosing index vs. query is simply 
picking your poison, but it all depends on your app and your data and your 
user expectations, and you, the developer, have tools to moderate a lot of 
these issues.


Index-time synonyms have the problem (among others) that they cannot be 
changed without reindexing.


One technique is to simulate the query-time synonym filter expansion by 
having your app preprocess user queries to expand to the OR of the 
synonyms and then boost or de-boost the synonyms as makes sense for your 
app.


For example,

  (tv^0.5 OR television^2.5 OR boob tube^0.0001)

-- Jack Krupansky

-Original Message- From: Steve Rowe
Sent: Wednesday, December 12, 2012 5:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Can a field with defined synonym be searched without the 
synonym?


Hmm, I've gotten this very wrong :) - DisjunctionMaxQuery will operate 
per-doc, so using it in the way I suggested will not allow for synonym IDF 
leveling across documents.  Also, scoring obviously includes more factors 
than IDF.


On Dec 12, 2012, at 5:18 PM, Steve Rowe sar...@gmail.com wrote:

But couldn't the IDF problem be fixed by applying the same IDF to all 
synonyms, e.g. via DisjunctionMaxQuery?  (Maybe the ideal would be an 
average, not a max.)


(E)dismax applies this query per-field, but AFAICT there is nothing 
stopping anybody (modulo query parser construction :) ) from using it on 
synonyms in the same field.


Steve

On Dec 12, 2012, at 12:50 PM, Walter Underwood wun...@wunderwood.org 
wrote:


Query parsers cannot fix the IDF problem or make query-time synonyms 
faster. Query synonym expansion makes more search terms. More search 
terms are more work at query time.


The IDF problem is real; I've run up against it. The most rare variant 
of the synonym have the highest score. This probably the opposite of 
what you want. For me, it was TV and television. Documents with TV 
had higher scores than those with television.


wunder

On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:


@wunder
It is a misconception (well, supported by that wiki description) that 
the
query time synonym filter have these problems. It is actually the 
default

parser, that is causing these problems. Look at this if you still think
that index time synonyms are cure for all:
https://issues.apache.org/jira/browse/LUCENE-4499

@joe
If you can use the flexible query parser (as linked in by @Swati) then 
all
you need to do is to define a different field with a different 
tokenizer

chain and then swap the field names before the analyzers processes the
document (and then rewrite the field name back - for example, we have
fields called author and author_nosyn)

roman

On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood 
wun...@wunderwood.orgwrote:


Query time synonyms have known problems. They are slower, cause 
incorrect

IDF, and don't work for phrase synonyms.

Apply synonyms at index time and you will have none of those problems.

See:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

wunder

On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:

Query-time analyzers are still applied, even if you include a string 
in

quotes. Would you expect foo to not match Foo just because it's
enclosed in quotes?


Also look at this, someone who had similar requirements:


http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html



-Original Message-
From: joe.cohe...@gmail.com [mailto:joe.cohe...@gmail.com]
Sent: Wednesday, December 12, 2012 12:09 PM
To: solr-user@lucene.apache.org
Subject: Re: Can a field with defined synonym be searched without the

synonym?



I'm aplying only query-time synonym, so I have the original

Re: Can a field with defined synonym be searched without the synonym?

2012-12-12 Thread Walter Underwood
I prefer fuzzy search for misspellings. Solr does a very nice job with those, 
weighting them by the similarity to the matched term.

wunder

On Dec 12, 2012, at 4:45 PM, Jack Krupansky wrote:

 Another great use case for synonyms is misspellings. I saw one synonym list 
 in which the top synonym was the phrase dead mouse (which doesn't look 
 misspelled at all); I won't tell you what it's proper synonym was, other 
 than to say that it was VERY app/culture-dependent. It was also interesting 
 because the user's original query phrase needed to be given a much lower 
 weighting in order to find what the user was likely looking for.
 
 -- Jack Krupansky
 
 -Original Message- From: Walter Underwood
 Sent: Wednesday, December 12, 2012 7:16 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Can a field with defined synonym be searched without the synonym?
 
 If you have tons of content, you can do selective reindexing. You only need 
 to reindex the docs containing the the new terms. If I add a synonym for 
 babysitter and baby sitter, then I can do a search for documents 
 containing either of those, and only reindex those.
 
 Reverse weighting to even out the IDF would work, but it could be pretty 
 tweaky. If one synonym is very rare, you put in small weight, but then you 
 index several documents with that term and the it is overweighted.
 
 wunder
 
 On Dec 12, 2012, at 4:09 PM, Jack Krupansky wrote:
 
 Sure, synonyms have lots of issues and choosing index vs. query is simply 
 picking your poison, but it all depends on your app and your data and your 
 user expectations, and you, the developer, have tools to moderate a lot of 
 these issues.
 
 Index-time synonyms have the problem (among others) that they cannot be 
 changed without reindexing.
 
 One technique is to simulate the query-time synonym filter expansion by 
 having your app preprocess user queries to expand to the OR of the synonyms 
 and then boost or de-boost the synonyms as makes sense for your app.
 
 For example,
 
  (tv^0.5 OR television^2.5 OR boob tube^0.0001)
 
 -- Jack Krupansky
 
 -Original Message- From: Steve Rowe
 Sent: Wednesday, December 12, 2012 5:28 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Can a field with defined synonym be searched without the 
 synonym?
 
 Hmm, I've gotten this very wrong :) - DisjunctionMaxQuery will operate 
 per-doc, so using it in the way I suggested will not allow for synonym IDF 
 leveling across documents.  Also, scoring obviously includes more factors 
 than IDF.
 
 On Dec 12, 2012, at 5:18 PM, Steve Rowe sar...@gmail.com wrote:
 
 But couldn't the IDF problem be fixed by applying the same IDF to all 
 synonyms, e.g. via DisjunctionMaxQuery?  (Maybe the ideal would be an 
 average, not a max.)
 
 (E)dismax applies this query per-field, but AFAICT there is nothing 
 stopping anybody (modulo query parser construction :) ) from using it on 
 synonyms in the same field.
 
 Steve
 
 On Dec 12, 2012, at 12:50 PM, Walter Underwood wun...@wunderwood.org 
 wrote:
 
 Query parsers cannot fix the IDF problem or make query-time synonyms 
 faster. Query synonym expansion makes more search terms. More search terms 
 are more work at query time.
 
 The IDF problem is real; I've run up against it. The most rare variant of 
 the synonym have the highest score. This probably the opposite of what you 
 want. For me, it was TV and television. Documents with TV had higher 
 scores than those with television.
 
 wunder
 
 On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:
 
 @wunder
 It is a misconception (well, supported by that wiki description) that the
 query time synonym filter have these problems. It is actually the default
 parser, that is causing these problems. Look at this if you still think
 that index time synonyms are cure for all:
 https://issues.apache.org/jira/browse/LUCENE-4499
 
 @joe
 If you can use the flexible query parser (as linked in by @Swati) then all
 you need to do is to define a different field with a different tokenizer
 chain and then swap the field names before the analyzers processes the
 document (and then rewrite the field name back - for example, we have
 fields called author and author_nosyn)
 
 roman
 
 On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood 
 wun...@wunderwood.orgwrote:
 
 Query time synonyms have known problems. They are slower, cause incorrect
 IDF, and don't work for phrase synonyms.
 
 Apply synonyms at index time and you will have none of those problems.
 
 See:
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
 
 wunder
 
 On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
 
 Query-time analyzers are still applied, even if you include a string in
 quotes. Would you expect foo to not match Foo just because it's
 enclosed in quotes?
 
 Also look at this, someone who had similar requirements:
 
 http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
 
 
 -Original Message

Re: Can a field with defined synonym be searched without the synonym?

2012-12-12 Thread Roman Chyla
Well, this IDF problem has more sides. So, let's say your synonym file
contains multi-token synonyms (it does, right? or perhaps you don't need
it? well, some people do)

TV, TV set, TV foo, television

if you use the default synonym expansion, when you index 'television'

you have increased frequency of also 'set', 'foo', so, the IDF of 'TV' is
the same as that of 'television' - but IDF of 'foo' and 'set' has changed
(their frequency increased, their IDF decreased) -- TV's have in fact made
'foo' term very frequent and undesirable

So, you might be sure that IDF of 'TV' and 'television' are the same, but
you are not aware it has 'screwed' other (desirable) terms - so it really
depends. And I wouldn't argue these cases are esoteric.

And finally: there are use cases out there, where people NEED to switch off
synonym expansion at will (find only these documents, that contain the word
'TV' and not that bloody 'foo'). This cannot be done if the index contains
all synonym terms (unless you have a way to mark the original and the
synonym in the index).

roman


On Wed, Dec 12, 2012 at 12:50 PM, Walter Underwood wun...@wunderwood.orgwrote:

 Query parsers cannot fix the IDF problem or make query-time synonyms
 faster. Query synonym expansion makes more search terms. More search terms
 are more work at query time.

 The IDF problem is real; I've run up against it. The most rare variant of
 the synonym have the highest score. This probably the opposite of what you
 want. For me, it was TV and television. Documents with TV had higher
 scores than those with television.

 wunder

 On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:

  @wunder
  It is a misconception (well, supported by that wiki description) that the
  query time synonym filter have these problems. It is actually the default
  parser, that is causing these problems. Look at this if you still think
  that index time synonyms are cure for all:
  https://issues.apache.org/jira/browse/LUCENE-4499
 
  @joe
  If you can use the flexible query parser (as linked in by @Swati) then
 all
  you need to do is to define a different field with a different tokenizer
  chain and then swap the field names before the analyzers processes the
  document (and then rewrite the field name back - for example, we have
  fields called author and author_nosyn)
 
  roman
 
  On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood 
 wun...@wunderwood.orgwrote:
 
  Query time synonyms have known problems. They are slower, cause
 incorrect
  IDF, and don't work for phrase synonyms.
 
  Apply synonyms at index time and you will have none of those problems.
 
  See:
 
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
 
  wunder
 
  On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
 
  Query-time analyzers are still applied, even if you include a string in
  quotes. Would you expect foo to not match Foo just because it's
  enclosed in quotes?
 
  Also look at this, someone who had similar requirements:
 
 
 http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
 
 
  -Original Message-
  From: joe.cohe...@gmail.com [mailto:joe.cohe...@gmail.com]
  Sent: Wednesday, December 12, 2012 12:09 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Can a field with defined synonym be searched without the
  synonym?
 
 
  I'm aplying only query-time synonym, so I have the original values
  stored and indexed.
  I would've expected that if I search a strin with quotations, i'll get
  the exact match, without applying a synonym.
 
  any way to achieve that?
 
 
  Upayavira wrote
  You can only search against terms that are stored in your index. If
  you have applied index time synonyms, you can't remove them at query
  time.
 
  You can, however, use copyField to clone an incoming field to another
  field that doesn't use synonyms, and search against that field
 instead.
 
  Upayavira
 
  On Wed, Dec 12, 2012, at 04:26 PM,
 
  joe.cohen.m@
 
  wrote:
  Hi
  I hava a field type without defined synonym.txt which retrieves both
  records with home and house when I search either one of them.
 
  I want to be able to search this field on the specific value that I
  enter, without the synonym filter.
 
  is it possible?
 
  thanks.
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
  e-searched-without-the-synonym-tp4026381.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
  --
  Walter Underwood
  wun...@wunderwood.org
 
 
 
 

 --
 Walter Underwood
 wun...@wunderwood.org






Re: Can a field with defined synonym be searched without the synonym?

2012-12-12 Thread Walter Underwood
All of the applications I've seen with user control over synonym expansion 
where recall-oriented. The give me all matches for X kind of problem. So 
ranking is not as important.

wunder

On Dec 12, 2012, at 5:23 PM, Roman Chyla wrote:

 Well, this IDF problem has more sides. So, let's say your synonym file
 contains multi-token synonyms (it does, right? or perhaps you don't need
 it? well, some people do)
 
 TV, TV set, TV foo, television
 
 if you use the default synonym expansion, when you index 'television'
 
 you have increased frequency of also 'set', 'foo', so, the IDF of 'TV' is
 the same as that of 'television' - but IDF of 'foo' and 'set' has changed
 (their frequency increased, their IDF decreased) -- TV's have in fact made
 'foo' term very frequent and undesirable
 
 So, you might be sure that IDF of 'TV' and 'television' are the same, but
 you are not aware it has 'screwed' other (desirable) terms - so it really
 depends. And I wouldn't argue these cases are esoteric.
 
 And finally: there are use cases out there, where people NEED to switch off
 synonym expansion at will (find only these documents, that contain the word
 'TV' and not that bloody 'foo'). This cannot be done if the index contains
 all synonym terms (unless you have a way to mark the original and the
 synonym in the index).
 
 roman
 
 
 On Wed, Dec 12, 2012 at 12:50 PM, Walter Underwood 
 wun...@wunderwood.orgwrote:
 
 Query parsers cannot fix the IDF problem or make query-time synonyms
 faster. Query synonym expansion makes more search terms. More search terms
 are more work at query time.
 
 The IDF problem is real; I've run up against it. The most rare variant of
 the synonym have the highest score. This probably the opposite of what you
 want. For me, it was TV and television. Documents with TV had higher
 scores than those with television.
 
 wunder
 
 On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:
 
 @wunder
 It is a misconception (well, supported by that wiki description) that the
 query time synonym filter have these problems. It is actually the default
 parser, that is causing these problems. Look at this if you still think
 that index time synonyms are cure for all:
 https://issues.apache.org/jira/browse/LUCENE-4499
 
 @joe
 If you can use the flexible query parser (as linked in by @Swati) then
 all
 you need to do is to define a different field with a different tokenizer
 chain and then swap the field names before the analyzers processes the
 document (and then rewrite the field name back - for example, we have
 fields called author and author_nosyn)
 
 roman
 
 On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood 
 wun...@wunderwood.orgwrote:
 
 Query time synonyms have known problems. They are slower, cause
 incorrect
 IDF, and don't work for phrase synonyms.
 
 Apply synonyms at index time and you will have none of those problems.
 
 See:
 
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
 
 wunder
 
 On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
 
 Query-time analyzers are still applied, even if you include a string in
 quotes. Would you expect foo to not match Foo just because it's
 enclosed in quotes?
 
 Also look at this, someone who had similar requirements:
 
 
 http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
 
 
 -Original Message-
 From: joe.cohe...@gmail.com [mailto:joe.cohe...@gmail.com]
 Sent: Wednesday, December 12, 2012 12:09 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Can a field with defined synonym be searched without the
 synonym?
 
 
 I'm aplying only query-time synonym, so I have the original values
 stored and indexed.
 I would've expected that if I search a strin with quotations, i'll get
 the exact match, without applying a synonym.
 
 any way to achieve that?
 
 
 Upayavira wrote
 You can only search against terms that are stored in your index. If
 you have applied index time synonyms, you can't remove them at query
 time.
 
 You can, however, use copyField to clone an incoming field to another
 field that doesn't use synonyms, and search against that field
 instead.
 
 Upayavira
 
 On Wed, Dec 12, 2012, at 04:26 PM,
 
 joe.cohen.m@
 
 wrote:
 Hi
 I hava a field type without defined synonym.txt which retrieves both
 records with home and house when I search either one of them.
 
 I want to be able to search this field on the specific value that I
 enter, without the synonym filter.
 
 is it possible?
 
 thanks.
 
 
 
 --
 View this message in context:
 
 http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
 e-searched-without-the-synonym-tp4026381.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 
 --
 View this message in context:
 
 http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 --
 Walter Underwood
 wun

RE: Can a field with defined synonym be searched without the synonym?

2012-12-12 Thread Burgmans, Tom
In our case it's the opposite. For our clients it is very important that every 
synonym gets equal chances in the relevancy calculation. The fact that nol 
scores higher than net operating loss, simply because its document frequency 
is lower, is unacceptable and a reason to look for ways to disable the IDF from 
the score calculation. But that is in fact something I don't like to do since 
IDF is such an elementary part of the algorithm (and very useful for 
non-synonym searches).

Pre-processing synonyms to apply 'reverse weighting' is also a strategy to 
consider but I agree with Walter that this very error-prone, things could get 
easily out of sync. Moreover, none of our Dev-, QA-, STG-, PRD- environment 
contain exactly the same content, so it would require different tuned synonyms 
dictionary for each of them...meh...

In our previous search engine (FAST ESP) we basically switched off IDF, but I 
am still a bit hoping that there is a more sophisticated solution with Solr.


-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org]
Sent: Thursday 13 December 2012 02:30
To: solr-user@lucene.apache.org
Subject: Re: Can a field with defined synonym be searched without the synonym?

All of the applications I've seen with user control over synonym expansion 
where recall-oriented. The give me all matches for X kind of problem. So 
ranking is not as important.

wunder

On Dec 12, 2012, at 5:23 PM, Roman Chyla wrote:

 Well, this IDF problem has more sides. So, let's say your synonym file
 contains multi-token synonyms (it does, right? or perhaps you don't need
 it? well, some people do)

 TV, TV set, TV foo, television

 if you use the default synonym expansion, when you index 'television'

 you have increased frequency of also 'set', 'foo', so, the IDF of 'TV' is
 the same as that of 'television' - but IDF of 'foo' and 'set' has changed
 (their frequency increased, their IDF decreased) -- TV's have in fact made
 'foo' term very frequent and undesirable

 So, you might be sure that IDF of 'TV' and 'television' are the same, but
 you are not aware it has 'screwed' other (desirable) terms - so it really
 depends. And I wouldn't argue these cases are esoteric.

 And finally: there are use cases out there, where people NEED to switch off
 synonym expansion at will (find only these documents, that contain the word
 'TV' and not that bloody 'foo'). This cannot be done if the index contains
 all synonym terms (unless you have a way to mark the original and the
 synonym in the index).

 roman


 On Wed, Dec 12, 2012 at 12:50 PM, Walter Underwood 
 wun...@wunderwood.orgwrote:

 Query parsers cannot fix the IDF problem or make query-time synonyms
 faster. Query synonym expansion makes more search terms. More search terms
 are more work at query time.

 The IDF problem is real; I've run up against it. The most rare variant of
 the synonym have the highest score. This probably the opposite of what you
 want. For me, it was TV and television. Documents with TV had higher
 scores than those with television.

 wunder

 On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:

 @wunder
 It is a misconception (well, supported by that wiki description) that the
 query time synonym filter have these problems. It is actually the default
 parser, that is causing these problems. Look at this if you still think
 that index time synonyms are cure for all:
 https://issues.apache.org/jira/browse/LUCENE-4499

 @joe
 If you can use the flexible query parser (as linked in by @Swati) then
 all
 you need to do is to define a different field with a different tokenizer
 chain and then swap the field names before the analyzers processes the
 document (and then rewrite the field name back - for example, we have
 fields called author and author_nosyn)

 roman

 On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood 
 wun...@wunderwood.orgwrote:

 Query time synonyms have known problems. They are slower, cause
 incorrect
 IDF, and don't work for phrase synonyms.

 Apply synonyms at index time and you will have none of those problems.

 See:

 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

 wunder

 On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:

 Query-time analyzers are still applied, even if you include a string in
 quotes. Would you expect foo to not match Foo just because it's
 enclosed in quotes?

 Also look at this, someone who had similar requirements:


 http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html


 -Original Message-
 From: joe.cohe...@gmail.com [mailto:joe.cohe...@gmail.com]
 Sent: Wednesday, December 12, 2012 12:09 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Can a field with defined synonym be searched without the
 synonym?


 I'm aplying only query-time synonym, so I have the original values
 stored and indexed.
 I would've expected that if I search a strin with quotations, i'll get
 the exact match, without applying