Re: Disable IDF scoring on certain fields

2011-05-18 Thread Brian Lamb
I believe I have applied the patch correctly. However, I cannot seem to
figure out where the similarity class I create should reside. Any tips on
that?

Thanks,

Brian Lamb

On Tue, May 17, 2011 at 4:00 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Thank you Robert for pointing this out. This is not being used for
 autocomplete. I already have another core set up for that :-)

 The idea is like I outlined above. I just want a multivalued field that
 treats every term in the field the same so that the only way documents
 separate themselves is by an unrelated boost and/or matching on multiple
 terms in that field.


 On Tue, May 17, 2011 at 3:55 PM, Markus Jelsma markus.jel...@openindex.io
  wrote:

 Well, if you're experimental you can try trunk as Robert points out it has
 been fixed there. If not, i guess you're stuck with creating another core.

 If this fieldType specifically used for auto-completion? If so, another
 core,
 preferably on another machine, is in my opinion the way to go.
 Auto-completion
 is tough in terms of performance.

 Thanks Robert for pointing to the Jira ticket.

 Cheers

  Hi Markus,
 
  I was just looking at overriding DefaultSimilarity so your email was
 well
  timed. The problem I have with it is as you mentioned, it does not seem
  possible to do it on a field by field basis. Has anyone had any luck
 with
  doing some of the similarity functions on a field by field basis? I have
  need to do more than one of them and from what I can find, it seems that
  only computeNorm accounts for the name of the field.
 
  Thanks,
 
  Brian Lamb
 
  On Tue, May 17, 2011 at 3:34 PM, Markus Jelsma
 
  markus.jel...@openindex.iowrote:
   Hi,
  
   Although you can configure per field TF (by omitTermFreqAndPositions)
 you
   can't
   do this for IDF. If you index is only used for this specific purpose
   (seems like an auto-complete index) then you can override
   DefaultSimilarity and return a static value for IDF. If you still want
   IDF for other fields then i
   think you have a problem because Solr doesn't yet support per-field
   similarity.
  
  
  
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/jav
   a/org/apache/lucene/search/DefaultSimilarity.java?view=markup
  
   Cheers,
  
Hi all,
   
I have a field defined in my schema.xml file as
   
fieldType name=edgengram class=solr.TextField
positionIncrementGap=1000
   
   analyzer
   
 tokenizer class=solr.LowerCaseTokenizerFactory /
 filter class=solr.EdgeNGramFilterFactory minGramSize=1
   
maxGramSize=25 side=front /
   
   /analyzer
   
/fieldType
field name=myfield multiValued=true type=edgengram
indexed=true stored=true required=false omitNorms=true /
   
I would like do disable IDF scoring on this field. I am not
 interested
in how rare the term is, I only care if the term is present or not.
The idea is that if a user does a search for myfield:dog OR
myfield:pony, that any document containing dog or pony would be
scored identically. In the case that both showed up, that record
 would
be moved to the top but all the records where they both showed up
would have the same score.
   
So long story short, how can I disable the idf score for this
particular field?
   
Thanks,
   
Brian Lamb





Disable IDF scoring on certain fields

2011-05-17 Thread Brian Lamb
Hi all,

I have a field defined in my schema.xml file as

fieldType name=edgengram class=solr.TextField
positionIncrementGap=1000
   analyzer
 tokenizer class=solr.LowerCaseTokenizerFactory /
 filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=25 side=front /
   /analyzer
/fieldType
field name=myfield multiValued=true type=edgengram indexed=true
stored=true required=false omitNorms=true /

I would like do disable IDF scoring on this field. I am not interested in
how rare the term is, I only care if the term is present or not. The idea is
that if a user does a search for myfield:dog OR myfield:pony, that any
document containing dog or pony would be scored identically. In the case
that both showed up, that record would be moved to the top but all the
records where they both showed up would have the same score.

So long story short, how can I disable the idf score for this particular
field?

Thanks,

Brian Lamb


Re: Disable IDF scoring on certain fields

2011-05-17 Thread Markus Jelsma
Hi,

Although you can configure per field TF (by omitTermFreqAndPositions) you can't 
do this for IDF. If you index is only used for this specific purpose (seems 
like an auto-complete index) then you can override DefaultSimilarity and 
return a static value for IDF. If you still want IDF for other fields then i 
think you have a problem because Solr doesn't yet support per-field similarity.

http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/search/DefaultSimilarity.java?view=markup

Cheers,

 Hi all,
 
 I have a field defined in my schema.xml file as
 
 fieldType name=edgengram class=solr.TextField
 positionIncrementGap=1000
analyzer
  tokenizer class=solr.LowerCaseTokenizerFactory /
  filter class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=25 side=front /
/analyzer
 /fieldType
 field name=myfield multiValued=true type=edgengram indexed=true
 stored=true required=false omitNorms=true /
 
 I would like do disable IDF scoring on this field. I am not interested in
 how rare the term is, I only care if the term is present or not. The idea
 is that if a user does a search for myfield:dog OR myfield:pony, that
 any document containing dog or pony would be scored identically. In the
 case that both showed up, that record would be moved to the top but all
 the records where they both showed up would have the same score.
 
 So long story short, how can I disable the idf score for this particular
 field?
 
 Thanks,
 
 Brian Lamb


Re: Disable IDF scoring on certain fields

2011-05-17 Thread Brian Lamb
Hi Markus,

I was just looking at overriding DefaultSimilarity so your email was well
timed. The problem I have with it is as you mentioned, it does not seem
possible to do it on a field by field basis. Has anyone had any luck with
doing some of the similarity functions on a field by field basis? I have
need to do more than one of them and from what I can find, it seems that
only computeNorm accounts for the name of the field.

Thanks,

Brian Lamb

On Tue, May 17, 2011 at 3:34 PM, Markus Jelsma
markus.jel...@openindex.iowrote:

 Hi,

 Although you can configure per field TF (by omitTermFreqAndPositions) you
 can't
 do this for IDF. If you index is only used for this specific purpose (seems
 like an auto-complete index) then you can override DefaultSimilarity and
 return a static value for IDF. If you still want IDF for other fields then
 i
 think you have a problem because Solr doesn't yet support per-field
 similarity.


 http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/search/DefaultSimilarity.java?view=markup

 Cheers,

  Hi all,
 
  I have a field defined in my schema.xml file as
 
  fieldType name=edgengram class=solr.TextField
  positionIncrementGap=1000
 analyzer
   tokenizer class=solr.LowerCaseTokenizerFactory /
   filter class=solr.EdgeNGramFilterFactory minGramSize=1
  maxGramSize=25 side=front /
 /analyzer
  /fieldType
  field name=myfield multiValued=true type=edgengram indexed=true
  stored=true required=false omitNorms=true /
 
  I would like do disable IDF scoring on this field. I am not interested in
  how rare the term is, I only care if the term is present or not. The idea
  is that if a user does a search for myfield:dog OR myfield:pony, that
  any document containing dog or pony would be scored identically. In the
  case that both showed up, that record would be moved to the top but all
  the records where they both showed up would have the same score.
 
  So long story short, how can I disable the idf score for this particular
  field?
 
  Thanks,
 
  Brian Lamb



Re: Disable IDF scoring on certain fields

2011-05-17 Thread Robert Muir
On Tue, May 17, 2011 at 3:34 PM, Markus Jelsma
markus.jel...@openindex.io wrote:
 If you still want IDF for other fields then i
 think you have a problem because Solr doesn't yet support per-field 
 similarity.


it does in trunk: https://issues.apache.org/jira/browse/SOLR-2338


Re: Disable IDF scoring on certain fields

2011-05-17 Thread Markus Jelsma
Well, if you're experimental you can try trunk as Robert points out it has 
been fixed there. If not, i guess you're stuck with creating another core.

If this fieldType specifically used for auto-completion? If so, another core, 
preferably on another machine, is in my opinion the way to go. Auto-completion 
is tough in terms of performance.

Thanks Robert for pointing to the Jira ticket.

Cheers

 Hi Markus,
 
 I was just looking at overriding DefaultSimilarity so your email was well
 timed. The problem I have with it is as you mentioned, it does not seem
 possible to do it on a field by field basis. Has anyone had any luck with
 doing some of the similarity functions on a field by field basis? I have
 need to do more than one of them and from what I can find, it seems that
 only computeNorm accounts for the name of the field.
 
 Thanks,
 
 Brian Lamb
 
 On Tue, May 17, 2011 at 3:34 PM, Markus Jelsma
 
 markus.jel...@openindex.iowrote:
  Hi,
  
  Although you can configure per field TF (by omitTermFreqAndPositions) you
  can't
  do this for IDF. If you index is only used for this specific purpose
  (seems like an auto-complete index) then you can override
  DefaultSimilarity and return a static value for IDF. If you still want
  IDF for other fields then i
  think you have a problem because Solr doesn't yet support per-field
  similarity.
  
  
  http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/jav
  a/org/apache/lucene/search/DefaultSimilarity.java?view=markup
  
  Cheers,
  
   Hi all,
   
   I have a field defined in my schema.xml file as
   
   fieldType name=edgengram class=solr.TextField
   positionIncrementGap=1000
   
  analyzer
  
tokenizer class=solr.LowerCaseTokenizerFactory /
filter class=solr.EdgeNGramFilterFactory minGramSize=1
   
   maxGramSize=25 side=front /
   
  /analyzer
   
   /fieldType
   field name=myfield multiValued=true type=edgengram
   indexed=true stored=true required=false omitNorms=true /
   
   I would like do disable IDF scoring on this field. I am not interested
   in how rare the term is, I only care if the term is present or not.
   The idea is that if a user does a search for myfield:dog OR
   myfield:pony, that any document containing dog or pony would be
   scored identically. In the case that both showed up, that record would
   be moved to the top but all the records where they both showed up
   would have the same score.
   
   So long story short, how can I disable the idf score for this
   particular field?
   
   Thanks,
   
   Brian Lamb


Re: Disable IDF scoring on certain fields

2011-05-17 Thread Brian Lamb
Thank you Robert for pointing this out. This is not being used for
autocomplete. I already have another core set up for that :-)

The idea is like I outlined above. I just want a multivalued field that
treats every term in the field the same so that the only way documents
separate themselves is by an unrelated boost and/or matching on multiple
terms in that field.


On Tue, May 17, 2011 at 3:55 PM, Markus Jelsma
markus.jel...@openindex.iowrote:

 Well, if you're experimental you can try trunk as Robert points out it has
 been fixed there. If not, i guess you're stuck with creating another core.

 If this fieldType specifically used for auto-completion? If so, another
 core,
 preferably on another machine, is in my opinion the way to go.
 Auto-completion
 is tough in terms of performance.

 Thanks Robert for pointing to the Jira ticket.

 Cheers

  Hi Markus,
 
  I was just looking at overriding DefaultSimilarity so your email was well
  timed. The problem I have with it is as you mentioned, it does not seem
  possible to do it on a field by field basis. Has anyone had any luck with
  doing some of the similarity functions on a field by field basis? I have
  need to do more than one of them and from what I can find, it seems that
  only computeNorm accounts for the name of the field.
 
  Thanks,
 
  Brian Lamb
 
  On Tue, May 17, 2011 at 3:34 PM, Markus Jelsma
 
  markus.jel...@openindex.iowrote:
   Hi,
  
   Although you can configure per field TF (by omitTermFreqAndPositions)
 you
   can't
   do this for IDF. If you index is only used for this specific purpose
   (seems like an auto-complete index) then you can override
   DefaultSimilarity and return a static value for IDF. If you still want
   IDF for other fields then i
   think you have a problem because Solr doesn't yet support per-field
   similarity.
  
  
  
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/jav
   a/org/apache/lucene/search/DefaultSimilarity.java?view=markup
  
   Cheers,
  
Hi all,
   
I have a field defined in my schema.xml file as
   
fieldType name=edgengram class=solr.TextField
positionIncrementGap=1000
   
   analyzer
   
 tokenizer class=solr.LowerCaseTokenizerFactory /
 filter class=solr.EdgeNGramFilterFactory minGramSize=1
   
maxGramSize=25 side=front /
   
   /analyzer
   
/fieldType
field name=myfield multiValued=true type=edgengram
indexed=true stored=true required=false omitNorms=true /
   
I would like do disable IDF scoring on this field. I am not
 interested
in how rare the term is, I only care if the term is present or not.
The idea is that if a user does a search for myfield:dog OR
myfield:pony, that any document containing dog or pony would be
scored identically. In the case that both showed up, that record
 would
be moved to the top but all the records where they both showed up
would have the same score.
   
So long story short, how can I disable the idf score for this
particular field?
   
Thanks,
   
Brian Lamb