Index not respecting Omit Norms

2014-08-19 Thread Tim.Cardwell
Please reference the below images:

http://lucene.472066.n3.nabble.com/file/n4153863/Schema.png 

http://lucene.472066.n3.nabble.com/file/n4153863/SolrDescriptionSchemaBrowser.png
 

http://lucene.472066.n3.nabble.com/file/n4153863/SolrDescriptionDebugResults.png
 

As you can see from the first image, the text field-type doesn't define the
omitNorms flag, meaning it is set to false. Also on the first image you can
see that the description field doesn't define the omitNorms flag, again
meaning it is set to false. (Default for omitNorms is false). This can all
be confirmed on the second image, where the Properties and Schema rows have
omitNorms set to checked.
I am having some issues understanding why some results have a fieldNorm set
to 1 for matches on the description field. As you can see from the third
image, the description field has a rather large number of terms in it, yet
the fieldNorm is being set to 1.0 for matching 'supply' on the description
field. My guess is that the Omit Norms flag for the 'Index' row is causing
the issue.
Questions:
  
From the first picture, can anyone tell me what each row (Properties, Schema
and Index) refers to? I think the Properties row refers to the flags set
when defining the Field Type, which for this field is text. The Schema row
refers to the flags set when defining the field, which is description. I'm
not as sure where the Index row flags come from, but I'm assuming it defines
what the index is really representing?  
Am I right in assuming the Omit Norms flag in the Index row of the first
picture is what is causing fieldNorm issues in the second image?  
If I am correct in the above question, how do I fix it?
Additional information:
  
I am not using the standard request handler. I am using a custom request
handler that uses eDisMax.  
The description_sortAlpha field that the description field is copying to is
a text field *but* it has omitNorms set to true  
My Index Analyzers for the description field are:
WhitespaceTokenizerFactory, StopFilterFactory, WordDelimiterFilterFactory,
LowerCaseFilterFactory and RemoveDuplicatesTokenFIlterFactory, in that order  
My Query Analyzers for the description field are:
WhitespaceTokenizerFactory, SynonymFilterFactory, StopFilterFactory,
WordDelimiterFilterFactory, LowerCaseFilterFactory and
RemoveDuplicatesTokenFilterFactory, in that order.  
The description field is not the only text field to be having this omit
norms issue for the Index row. There are actually a couple of others.
Thanks,
-Tim




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-not-respecting-Omit-Norms-tp4153863.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Index not respecting Omit Norms

2014-08-19 Thread Chris Hostetter

: As you can see from the first image, the text field-type doesn't define the
: omitNorms flag, meaning it is set to false. Also on the first image you can
: see that the description field doesn't define the omitNorms flag, again
: meaning it is set to false. (Default for omitNorms is false). This can all
...
: I am having some issues understanding why some results have a fieldNorm set
: to 1 for matches on the description field. As you can see from the third
...
: From the first picture, can anyone tell me what each row (Properties, Schema
: and Index) refers to? I think the Properties row refers to the flags set
: when defining the Field Type, which for this field is text. The Schema row
: refers to the flags set when defining the field, which is description. I'm
: not as sure where the Index row flags come from, but I'm assuming it defines
: what the index is really representing?  
: Am I right in assuming the Omit Norms flag in the Index row of the first
: picture is what is causing fieldNorm issues in the second image?  
: If I am correct in the above question, how do I fix it?

From a quick glance at the UI JavaScript code (and the underlying 
LukeRequestHandler) I'm honestly not sure what the intended difference is 
between the Properties row and the Schema row.

I can tell you that the Index row represents what information about the 
field can actaully be extracted from the underlying index itself -- 
completely independently from the schema.   The fact that Omit Norms is 
checked in that row means that there is at least one document in your 
index that was indexed with omitNormws=true.

Most likely what happened is that you indexed a bunch of docs with 
omitNorms=true in your schema.xml, then later changed your schema to use 
norms, but those docs are still there in the index.



-Hoss
http://www.lucidworks.com/


Re: Norms

2013-07-14 Thread Mark Miller

On Jul 10, 2013, at 4:39 AM, Daniel Collins danwcoll...@gmail.com wrote:

 QueryNorm is what I'm still trying to get to the bottom of exactly :) 

If you have not seen it, some reading from the past here…

https://issues.apache.org/jira/browse/LUCENE-1896

- Mark

Re: Norms

2013-07-12 Thread William Bell
Thanks.

Yeah I don't really want the queryNorm on


On Wed, Jul 10, 2013 at 2:39 AM, Daniel Collins danwcoll...@gmail.comwrote:

 I don't know the full answer to your question, but here's what I can offer.

 Solr offers 2 types of normalisation, FieldNorm and QueryNorm.  FieldNorm
 is as the name suggests field level normalisation, based on length of the
 field, and can be controlled by the omitNorms parameter on the field.  In
 your example, fieldNorm is always 1.0, see below, so that suggests you have
 correctly turned off field normalisation on the name_edgy field.

 1.0 = fieldNorm(field=name_edgy, doc=231378)

 QueryNorm is what I'm still trying to get to the bottom of exactly :)  But
 its something that tries to normalise the results of different term queries
 so they are broadly comparable. You haven't supplied the query you've run ,
 but based on the qf, bf, I'm assuming it breaks down into a DisMax query on
 3 fields (name_edgy, name_edge, name_word) so queryNorm is trying to ensure
 that the results of those 3 queries can be compared.  The exact details of
 it I'm still trying to get to the bottom of (any volunteers with more info
 chip in!)

 From earlier answers to the list, queryNorm is calculated in the Similarity
 object, I need to dig further, but that's probably a good place to start.



 On 10 July 2013 04:57, William Bell billnb...@gmail.com wrote:

  I have a field that has omitNorms=true, but when I look at debugQuery I
 see
  that
  the field is being normalized for the score.
 
  What can I do to turn off normalization in the score?
 
  I want a simple way to do 2 things:
 
  boost geodist() highest at 1 mile and lowest at 100 miles.
  plus add a boost for a query=edgefield^5.
 
  I only want tf() and no queryNorm. I am not even sure I want idf() but I
  can probably live with rare names being boosted.
 
 
 
  The results are being normalized. See below. I tried dismax and edismax -
  bf, bq and boost.
 
  requestHandler name=autoproviderdist class=solr.SearchHandler
  lst name=defaults
  str name=echoParamsnone/str
  str name=defTypeedismax/str
  float name=tie0.01/float
  str name=fl
  display_name,city_state,prov_url,pwid,city_state_alternative
  /str
  !--
  str name=bq_val_:sum(recip(geodist(store_geohash), .5, 6, 6),
  0.1)^10/str
  --
  str name=boostsum(recip(geodist(store_geohash), .5, 6, 6), 0.1)/str
  int name=rows5/int
  str name=q.alt*:*/str
  str name=qfname_edgy^.9 name_edge^.9 name_word/str
  str name=grouptrue/str
  str name=group.fieldpwid/str
  str name=group.maintrue/str
  !-- str name=pfname_edgy/str do not turn on --
  str name=sortscore desc, last_name asc/str
  str name=d100/str
  str name=pt39.740112,-104.984856/str
  str name=sfieldstore_geohash/str
  str name=hlfalse/str
  str name=hl.flname_edgy/str
  str name=mm2-1 4-2 6-3/str
  /lst
  /requestHandler
 
  0.058555886 = queryNorm
 
  product of: 10.854807 = (MATCH) sum of: 1.8391232 = (MATCH) max plus 0.01
  times others of: 1.8214592 = (MATCH) weight(name_edge:paul^0.9 in
 231378),
  product of: 0.30982485 = queryWeight(name_edge:paul^0.9), product of:
 0.9 =
  boost 5.8789964 = idf(docFreq=26567, maxDocs=3493655)* 0.058555886 =
  queryNorm* 5.8789964 = (MATCH) fieldWeight(name_edge:paul in 231378),
  product of: 1.0 = tf(termFreq(name_edge:paul)=1) 5.8789964 =
  idf(docFreq=26567, maxDocs=3493655) 1.0 = fieldNorm(field=name_edge,
  doc=231378) 1.7664119 = (MATCH) weight(name_edgy:paul^0.9 in 231378),
  product of: 0.30510724 = queryWeight(name_edgy:paul^0.9), product of:
 0.9 =
  boost 5.789479 = idf(docFreq=29055, maxDocs=3493655)* 0.058555886 =
  queryNorm* 5.789479 = (MATCH) fieldWeight(name_edgy:paul in 231378),
  product of: 1.0 = tf(termFreq(name_edgy:paul)=1) 5.789479 =
  idf(docFreq=29055, maxDocs=3493655) 1.0 = fieldNorm(field=name_edgy,
  doc=231378) 9.015684 = (MATCH) max plus 0.01 times others of: 8.9352665 =
  (MATCH) weight(name_word:nutting in 231378), product of: 0.72333425 =
  queryWeight(name_word:nutting), product of: 12.352887 = idf(docFreq=40,
  maxDocs=3493655) 0.058555886 = queryNorm 12.352887 = (MATCH)
  fieldWeight(name_word:nutting in 231378), product of: 1.0 =
  tf(termFreq(name_word:nutting)=1) 12.352887 = idf(docFreq=40,
  maxDocs=3493655) 1.0 = fieldNorm(field=name_word, doc=231378) 8.04174 =
  (MATCH) weight(name_edgy:nutting^0.9 in 231378), product of: 0.65100086 =
  queryWeight(name_edgy:nutting^0.9), product of: 0.9 = boost 12.352887 =
  idf(docFreq=40, maxDocs=3493655)* 0.058555886 = queryNorm* 12.352887 =
  (MATCH) fieldWeight(name_edgy:nutting in 231378), product of: 1.0 =
  tf(termFreq(name_edgy:nutting)=1) 12.352887 = idf(docFreq=40,
  maxDocs=3493655) 1.0 = fieldNorm(field=name_edgy, doc=231378) 1.0855998 =
 
 
 sum(6.0/(0.5*float(geodist(39.74168747663498,-104.9849385023117,39.740112,-104.984856))+6.0),const(0.1))
 
 
 
  --
  Bill Bell
  billnb...@gmail.com
  cell 720-256-8076
 




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Norms

2013-07-12 Thread Lance Norskog
Norms stay in the index even if you delete all of the data. If you just 
changed the schema, emptied the index, and tested again, you've still 
got norms in there.


You can examine the index with Luke to verify this.

On 07/09/2013 08:57 PM, William Bell wrote:

I have a field that has omitNorms=true, but when I look at debugQuery I see
that
the field is being normalized for the score.

What can I do to turn off normalization in the score?

I want a simple way to do 2 things:

boost geodist() highest at 1 mile and lowest at 100 miles.
plus add a boost for a query=edgefield^5.

I only want tf() and no queryNorm. I am not even sure I want idf() but I
can probably live with rare names being boosted.



The results are being normalized. See below. I tried dismax and edismax -
bf, bq and boost.

requestHandler name=autoproviderdist class=solr.SearchHandler
lst name=defaults
str name=echoParamsnone/str
str name=defTypeedismax/str
float name=tie0.01/float
str name=fl
display_name,city_state,prov_url,pwid,city_state_alternative
/str
!--
str name=bq_val_:sum(recip(geodist(store_geohash), .5, 6, 6),
0.1)^10/str
--
str name=boostsum(recip(geodist(store_geohash), .5, 6, 6), 0.1)/str
int name=rows5/int
str name=q.alt*:*/str
str name=qfname_edgy^.9 name_edge^.9 name_word/str
str name=grouptrue/str
str name=group.fieldpwid/str
str name=group.maintrue/str
!-- str name=pfname_edgy/str do not turn on --
str name=sortscore desc, last_name asc/str
str name=d100/str
str name=pt39.740112,-104.984856/str
str name=sfieldstore_geohash/str
str name=hlfalse/str
str name=hl.flname_edgy/str
str name=mm2-1 4-2 6-3/str
/lst
/requestHandler

0.058555886 = queryNorm

product of: 10.854807 = (MATCH) sum of: 1.8391232 = (MATCH) max plus 0.01
times others of: 1.8214592 = (MATCH) weight(name_edge:paul^0.9 in 231378),
product of: 0.30982485 = queryWeight(name_edge:paul^0.9), product of: 0.9 =
boost 5.8789964 = idf(docFreq=26567, maxDocs=3493655)* 0.058555886 =
queryNorm* 5.8789964 = (MATCH) fieldWeight(name_edge:paul in 231378),
product of: 1.0 = tf(termFreq(name_edge:paul)=1) 5.8789964 =
idf(docFreq=26567, maxDocs=3493655) 1.0 = fieldNorm(field=name_edge,
doc=231378) 1.7664119 = (MATCH) weight(name_edgy:paul^0.9 in 231378),
product of: 0.30510724 = queryWeight(name_edgy:paul^0.9), product of: 0.9 =
boost 5.789479 = idf(docFreq=29055, maxDocs=3493655)* 0.058555886 =
queryNorm* 5.789479 = (MATCH) fieldWeight(name_edgy:paul in 231378),
product of: 1.0 = tf(termFreq(name_edgy:paul)=1) 5.789479 =
idf(docFreq=29055, maxDocs=3493655) 1.0 = fieldNorm(field=name_edgy,
doc=231378) 9.015684 = (MATCH) max plus 0.01 times others of: 8.9352665 =
(MATCH) weight(name_word:nutting in 231378), product of: 0.72333425 =
queryWeight(name_word:nutting), product of: 12.352887 = idf(docFreq=40,
maxDocs=3493655) 0.058555886 = queryNorm 12.352887 = (MATCH)
fieldWeight(name_word:nutting in 231378), product of: 1.0 =
tf(termFreq(name_word:nutting)=1) 12.352887 = idf(docFreq=40,
maxDocs=3493655) 1.0 = fieldNorm(field=name_word, doc=231378) 8.04174 =
(MATCH) weight(name_edgy:nutting^0.9 in 231378), product of: 0.65100086 =
queryWeight(name_edgy:nutting^0.9), product of: 0.9 = boost 12.352887 =
idf(docFreq=40, maxDocs=3493655)* 0.058555886 = queryNorm* 12.352887 =
(MATCH) fieldWeight(name_edgy:nutting in 231378), product of: 1.0 =
tf(termFreq(name_edgy:nutting)=1) 12.352887 = idf(docFreq=40,
maxDocs=3493655) 1.0 = fieldNorm(field=name_edgy, doc=231378) 1.0855998 =
sum(6.0/(0.5*float(geodist(39.74168747663498,-104.9849385023117,39.740112,-104.984856))+6.0),const(0.1))







Re: Norms

2013-07-10 Thread Daniel Collins
I don't know the full answer to your question, but here's what I can offer.

Solr offers 2 types of normalisation, FieldNorm and QueryNorm.  FieldNorm
is as the name suggests field level normalisation, based on length of the
field, and can be controlled by the omitNorms parameter on the field.  In
your example, fieldNorm is always 1.0, see below, so that suggests you have
correctly turned off field normalisation on the name_edgy field.

1.0 = fieldNorm(field=name_edgy, doc=231378)

QueryNorm is what I'm still trying to get to the bottom of exactly :)  But
its something that tries to normalise the results of different term queries
so they are broadly comparable. You haven't supplied the query you've run ,
but based on the qf, bf, I'm assuming it breaks down into a DisMax query on
3 fields (name_edgy, name_edge, name_word) so queryNorm is trying to ensure
that the results of those 3 queries can be compared.  The exact details of
it I'm still trying to get to the bottom of (any volunteers with more info
chip in!)

From earlier answers to the list, queryNorm is calculated in the Similarity
object, I need to dig further, but that's probably a good place to start.



On 10 July 2013 04:57, William Bell billnb...@gmail.com wrote:

 I have a field that has omitNorms=true, but when I look at debugQuery I see
 that
 the field is being normalized for the score.

 What can I do to turn off normalization in the score?

 I want a simple way to do 2 things:

 boost geodist() highest at 1 mile and lowest at 100 miles.
 plus add a boost for a query=edgefield^5.

 I only want tf() and no queryNorm. I am not even sure I want idf() but I
 can probably live with rare names being boosted.



 The results are being normalized. See below. I tried dismax and edismax -
 bf, bq and boost.

 requestHandler name=autoproviderdist class=solr.SearchHandler
 lst name=defaults
 str name=echoParamsnone/str
 str name=defTypeedismax/str
 float name=tie0.01/float
 str name=fl
 display_name,city_state,prov_url,pwid,city_state_alternative
 /str
 !--
 str name=bq_val_:sum(recip(geodist(store_geohash), .5, 6, 6),
 0.1)^10/str
 --
 str name=boostsum(recip(geodist(store_geohash), .5, 6, 6), 0.1)/str
 int name=rows5/int
 str name=q.alt*:*/str
 str name=qfname_edgy^.9 name_edge^.9 name_word/str
 str name=grouptrue/str
 str name=group.fieldpwid/str
 str name=group.maintrue/str
 !-- str name=pfname_edgy/str do not turn on --
 str name=sortscore desc, last_name asc/str
 str name=d100/str
 str name=pt39.740112,-104.984856/str
 str name=sfieldstore_geohash/str
 str name=hlfalse/str
 str name=hl.flname_edgy/str
 str name=mm2-1 4-2 6-3/str
 /lst
 /requestHandler

 0.058555886 = queryNorm

 product of: 10.854807 = (MATCH) sum of: 1.8391232 = (MATCH) max plus 0.01
 times others of: 1.8214592 = (MATCH) weight(name_edge:paul^0.9 in 231378),
 product of: 0.30982485 = queryWeight(name_edge:paul^0.9), product of: 0.9 =
 boost 5.8789964 = idf(docFreq=26567, maxDocs=3493655)* 0.058555886 =
 queryNorm* 5.8789964 = (MATCH) fieldWeight(name_edge:paul in 231378),
 product of: 1.0 = tf(termFreq(name_edge:paul)=1) 5.8789964 =
 idf(docFreq=26567, maxDocs=3493655) 1.0 = fieldNorm(field=name_edge,
 doc=231378) 1.7664119 = (MATCH) weight(name_edgy:paul^0.9 in 231378),
 product of: 0.30510724 = queryWeight(name_edgy:paul^0.9), product of: 0.9 =
 boost 5.789479 = idf(docFreq=29055, maxDocs=3493655)* 0.058555886 =
 queryNorm* 5.789479 = (MATCH) fieldWeight(name_edgy:paul in 231378),
 product of: 1.0 = tf(termFreq(name_edgy:paul)=1) 5.789479 =
 idf(docFreq=29055, maxDocs=3493655) 1.0 = fieldNorm(field=name_edgy,
 doc=231378) 9.015684 = (MATCH) max plus 0.01 times others of: 8.9352665 =
 (MATCH) weight(name_word:nutting in 231378), product of: 0.72333425 =
 queryWeight(name_word:nutting), product of: 12.352887 = idf(docFreq=40,
 maxDocs=3493655) 0.058555886 = queryNorm 12.352887 = (MATCH)
 fieldWeight(name_word:nutting in 231378), product of: 1.0 =
 tf(termFreq(name_word:nutting)=1) 12.352887 = idf(docFreq=40,
 maxDocs=3493655) 1.0 = fieldNorm(field=name_word, doc=231378) 8.04174 =
 (MATCH) weight(name_edgy:nutting^0.9 in 231378), product of: 0.65100086 =
 queryWeight(name_edgy:nutting^0.9), product of: 0.9 = boost 12.352887 =
 idf(docFreq=40, maxDocs=3493655)* 0.058555886 = queryNorm* 12.352887 =
 (MATCH) fieldWeight(name_edgy:nutting in 231378), product of: 1.0 =
 tf(termFreq(name_edgy:nutting)=1) 12.352887 = idf(docFreq=40,
 maxDocs=3493655) 1.0 = fieldNorm(field=name_edgy, doc=231378) 1.0855998 =

 sum(6.0/(0.5*float(geodist(39.74168747663498,-104.9849385023117,39.740112,-104.984856))+6.0),const(0.1))



 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076



Norms

2013-07-09 Thread William Bell
I have a field that has omitNorms=true, but when I look at debugQuery I see
that
the field is being normalized for the score.

What can I do to turn off normalization in the score?

I want a simple way to do 2 things:

boost geodist() highest at 1 mile and lowest at 100 miles.
plus add a boost for a query=edgefield^5.

I only want tf() and no queryNorm. I am not even sure I want idf() but I
can probably live with rare names being boosted.



The results are being normalized. See below. I tried dismax and edismax -
bf, bq and boost.

requestHandler name=autoproviderdist class=solr.SearchHandler
lst name=defaults
str name=echoParamsnone/str
str name=defTypeedismax/str
float name=tie0.01/float
str name=fl
display_name,city_state,prov_url,pwid,city_state_alternative
/str
!--
str name=bq_val_:sum(recip(geodist(store_geohash), .5, 6, 6),
0.1)^10/str
--
str name=boostsum(recip(geodist(store_geohash), .5, 6, 6), 0.1)/str
int name=rows5/int
str name=q.alt*:*/str
str name=qfname_edgy^.9 name_edge^.9 name_word/str
str name=grouptrue/str
str name=group.fieldpwid/str
str name=group.maintrue/str
!-- str name=pfname_edgy/str do not turn on --
str name=sortscore desc, last_name asc/str
str name=d100/str
str name=pt39.740112,-104.984856/str
str name=sfieldstore_geohash/str
str name=hlfalse/str
str name=hl.flname_edgy/str
str name=mm2-1 4-2 6-3/str
/lst
/requestHandler

0.058555886 = queryNorm

product of: 10.854807 = (MATCH) sum of: 1.8391232 = (MATCH) max plus 0.01
times others of: 1.8214592 = (MATCH) weight(name_edge:paul^0.9 in 231378),
product of: 0.30982485 = queryWeight(name_edge:paul^0.9), product of: 0.9 =
boost 5.8789964 = idf(docFreq=26567, maxDocs=3493655)* 0.058555886 =
queryNorm* 5.8789964 = (MATCH) fieldWeight(name_edge:paul in 231378),
product of: 1.0 = tf(termFreq(name_edge:paul)=1) 5.8789964 =
idf(docFreq=26567, maxDocs=3493655) 1.0 = fieldNorm(field=name_edge,
doc=231378) 1.7664119 = (MATCH) weight(name_edgy:paul^0.9 in 231378),
product of: 0.30510724 = queryWeight(name_edgy:paul^0.9), product of: 0.9 =
boost 5.789479 = idf(docFreq=29055, maxDocs=3493655)* 0.058555886 =
queryNorm* 5.789479 = (MATCH) fieldWeight(name_edgy:paul in 231378),
product of: 1.0 = tf(termFreq(name_edgy:paul)=1) 5.789479 =
idf(docFreq=29055, maxDocs=3493655) 1.0 = fieldNorm(field=name_edgy,
doc=231378) 9.015684 = (MATCH) max plus 0.01 times others of: 8.9352665 =
(MATCH) weight(name_word:nutting in 231378), product of: 0.72333425 =
queryWeight(name_word:nutting), product of: 12.352887 = idf(docFreq=40,
maxDocs=3493655) 0.058555886 = queryNorm 12.352887 = (MATCH)
fieldWeight(name_word:nutting in 231378), product of: 1.0 =
tf(termFreq(name_word:nutting)=1) 12.352887 = idf(docFreq=40,
maxDocs=3493655) 1.0 = fieldNorm(field=name_word, doc=231378) 8.04174 =
(MATCH) weight(name_edgy:nutting^0.9 in 231378), product of: 0.65100086 =
queryWeight(name_edgy:nutting^0.9), product of: 0.9 = boost 12.352887 =
idf(docFreq=40, maxDocs=3493655)* 0.058555886 = queryNorm* 12.352887 =
(MATCH) fieldWeight(name_edgy:nutting in 231378), product of: 1.0 =
tf(termFreq(name_edgy:nutting)=1) 12.352887 = idf(docFreq=40,
maxDocs=3493655) 1.0 = fieldNorm(field=name_edgy, doc=231378) 1.0855998 =
sum(6.0/(0.5*float(geodist(39.74168747663498,-104.9849385023117,39.740112,-104.984856))+6.0),const(0.1))



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Why would solr norms come up different from Lucene norms?

2012-05-05 Thread Lance Norskog
Which Similarity class do you use for the Lucene code? Solr has a custom one.

On Fri, May 4, 2012 at 6:30 AM, Benson Margulies bimargul...@gmail.com wrote:
 So, I've got some code that stores the same documents in a Lucene
 3.5.0 index and a Solr 3.5.0 instance. It's only five documents.

 For a particular field, the Solr norm is always 0.625, while the
 Lucene norm is .5.

 I've watched the code in NormsWriterPerField in both cases.

 In Solr we've got .577, in naked Lucene it's .5.

 I tried to check for boosts, and I don't see any non-1.0 document or
 field boosts.

 The Solr field is:

 field name=bt_rni_NameHRK_encodedName type=text_ws indexed=true
 stored=true multiValued=false /



-- 
Lance Norskog
goks...@gmail.com


Re: Why would solr norms come up different from Lucene norms?

2012-05-05 Thread Benson Margulies
On Sat, May 5, 2012 at 7:59 PM, Lance Norskog goks...@gmail.com wrote:
 Which Similarity class do you use for the Lucene code? Solr has a custom one.

I am embarassed to report that I also have a custom similarity that I
didn't know about, and once I configured that into Solr all was well.



 On Fri, May 4, 2012 at 6:30 AM, Benson Margulies bimargul...@gmail.com 
 wrote:
 So, I've got some code that stores the same documents in a Lucene
 3.5.0 index and a Solr 3.5.0 instance. It's only five documents.

 For a particular field, the Solr norm is always 0.625, while the
 Lucene norm is .5.

 I've watched the code in NormsWriterPerField in both cases.

 In Solr we've got .577, in naked Lucene it's .5.

 I tried to check for boosts, and I don't see any non-1.0 document or
 field boosts.

 The Solr field is:

 field name=bt_rni_NameHRK_encodedName type=text_ws indexed=true
 stored=true multiValued=false /



 --
 Lance Norskog
 goks...@gmail.com


Why would solr norms come up different from Lucene norms?

2012-05-04 Thread Benson Margulies
So, I've got some code that stores the same documents in a Lucene
3.5.0 index and a Solr 3.5.0 instance. It's only five documents.

For a particular field, the Solr norm is always 0.625, while the
Lucene norm is .5.

I've watched the code in NormsWriterPerField in both cases.

In Solr we've got .577, in naked Lucene it's .5.

I tried to check for boosts, and I don't see any non-1.0 document or
field boosts.

The Solr field is:

field name=bt_rni_NameHRK_encodedName type=text_ws indexed=true
stored=true multiValued=false /


RE: [Solr-3.4] Norms file size is large in case of many unique indexed fields in index

2011-11-11 Thread Ivan Hrytsyuk
Thank you guys for responses.

Some background on the task:
The problem we are trying to solve with Solr is the following. 
We have to provide a full-text search over documents that partially consist of 
fields that are always there and partially of additional metadata as key-value 
pairs where keys are not known beforehand. Yet we need to be able to search on 
the content of that additional meta-data.

Becuase we have to provide FTS abilities we have used Solr and not a HashMap or 
some BigTable.
To address the optionality of additional metadata fields and their 
searcheability we have decided to use Solr indexed dynamic fields. 

Questions:
1. Yonik, will your approach work for us with next data:
doc1
  uniqueFields:[100=boo foo roo,101=bar bar 100 boo]
doc2
  uniqueFields:[101=boo roo,102=bar foo 101 boo]
and we want to fetch documents that contain value 'foo' in metadata with field 
key: 100? (that is only doc1 should be returned)

2. Should I post issue to JIRA about large index size, or it's expected 
behaviour in our case?

Thanks, Ivan
 



From: ysee...@gmail.com [ysee...@gmail.com] On Behalf Of Yonik Seeley 
[yo...@lucidimagination.com]
Sent: Thursday, November 10, 2011 10:22 PM
To: solr-user@lucene.apache.org
Subject: Re: [Solr-3.4] Norms file size is large in case of many unique indexed 
fields in index

On Thu, Nov 10, 2011 at 7:42 AM, Ivan Hrytsyuk
ihryts...@softserveinc.com wrote:
 For 5000 documents (every document has 2 unique fields, 2*5000=1
 unique fields in index), index size is 48.24 MB.

You might be able to turn this around and encode the unique field
information in a multi-valued field:

For example, instead of
  myUniqueField100:foo  myUniqueField101:bar
you could do
  uniqueFields:[100=foo,101=bar]

The exact details depend on how you are going to use/query these
fields of course.

-Yonik
http://www.lucidimagination.com


[Solr-3.4] Norms file size is large in case of many unique indexed fields in index

2011-11-10 Thread Ivan Hrytsyuk
Hello everyone,

We have large index size in case norms are enabled.

schema.xml:

type declaration:
fieldType name=simpleTokenizer class=solr.TextField
positionIncrementGap=100 omitNorms=false
 analyzer
 tokenizer class=solr.KeywordTokenizerFactory /
 /analyzer
/fieldType

fields declaration:
field name=id stored=true indexed=true required=true
type=string /
field name=name stored=true indexed=true type=string /
dynamicField name=unique_* stored=false indexed=true
type=simpleTokenizer multiValued=false /

For 5000 documents (every document has 2 unique fields, 2*5000=1
unique fields in index), index size is 48.24 MB.
But if we enable omitting norms (omitNorms=true), index size is 0.56
MB.

Next, if we increase number of unique fields per document to 3
(3*5000=15000 unique fields in index) we receive: 72.23 MB and 0.70 MB
respectively.
And if we increase number of documents to 1 ( 3*1 unique fields
in index) we receive: 287.54 MB and 1.44 MB respectively.

We've prepared test application to reproduce mentioned behavior. It can
be downloaded here:
https://bitbucket.org/coldserenity/solr-large-index-with-norms

Could anyone point out if size of index is as expected in mentioned
cases? And if it's, what configuration can be applied to reduce size of
index.

Thank you in advance, Ivan


Re: [Solr-3.4] Norms file size is large in case of many unique indexed fields in index

2011-11-10 Thread Robert Muir
what is the point of a unique indexed field?

If for all of your fields, there is only one possible document, you
don't need length normalization, scoring, or a search engine at all...
just use a HashMap?

On Thu, Nov 10, 2011 at 7:42 AM, Ivan Hrytsyuk
ihryts...@softserveinc.com wrote:
 Hello everyone,

 We have large index size in case norms are enabled.

 schema.xml:

 type declaration:
 fieldType name=simpleTokenizer class=solr.TextField
 positionIncrementGap=100 omitNorms=false
     analyzer
         tokenizer class=solr.KeywordTokenizerFactory /
     /analyzer
 /fieldType

 fields declaration:
 field name=id stored=true indexed=true required=true
 type=string /
 field name=name stored=true indexed=true type=string /
 dynamicField name=unique_* stored=false indexed=true
 type=simpleTokenizer multiValued=false /

 For 5000 documents (every document has 2 unique fields, 2*5000=1
 unique fields in index), index size is 48.24 MB.
 But if we enable omitting norms (omitNorms=true), index size is 0.56
 MB.

 Next, if we increase number of unique fields per document to 3
 (3*5000=15000 unique fields in index) we receive: 72.23 MB and 0.70 MB
 respectively.
 And if we increase number of documents to 1 ( 3*1 unique fields
 in index) we receive: 287.54 MB and 1.44 MB respectively.

 We've prepared test application to reproduce mentioned behavior. It can
 be downloaded here:
 https://bitbucket.org/coldserenity/solr-large-index-with-norms

 Could anyone point out if size of index is as expected in mentioned
 cases? And if it's, what configuration can be applied to reduce size of
 index.

 Thank you in advance, Ivan




-- 
lucidimagination.com


Re: [Solr-3.4] Norms file size is large in case of many unique indexed fields in index

2011-11-10 Thread Yonik Seeley
On Thu, Nov 10, 2011 at 7:42 AM, Ivan Hrytsyuk
ihryts...@softserveinc.com wrote:
 For 5000 documents (every document has 2 unique fields, 2*5000=1
 unique fields in index), index size is 48.24 MB.

You might be able to turn this around and encode the unique field
information in a multi-valued field:

For example, instead of
  myUniqueField100:foo  myUniqueField101:bar
you could do
  uniqueFields:[100=foo,101=bar]

The exact details depend on how you are going to use/query these
fields of course.

-Yonik
http://www.lucidimagination.com


Re: Norms - scoring issue

2011-09-15 Thread Ahmet Arslan
It seems that fieldNorm difference is coming from the field named 'text'. And 
you didn't include the definition of text field. Did you omit norms for that 
field too?


By the way I see that you have store=true in some places but it should be 
store*d*=true.

--- On Wed, 9/14/11, Adolfo Castro Menna adolfo.castrome...@gmail.com wrote:

 From: Adolfo Castro Menna adolfo.castrome...@gmail.com
 Subject: Norms - scoring issue
 To: solr-user@lucene.apache.org
 Date: Wednesday, September 14, 2011, 11:13 PM
 Hi All,
 
 I hope someone could shed some light on the issue I'm
 facing with solr
 3.1.0. It looks like it's computing diferrent fieldNorm
 values despite my
 configuration that aims to ignore it.
 
    field name=item_name type=textgen
 indexed=true store=true
 omitNorms=true omitTermFrequencyAndPositions=true
 /
    field name=item_description
 type=textTight indexed=true
 store=true omitNorms=true
 omitTermFrequencyAndPositions=true /
    field name=item_tags type=text
 indexed=true stored=true
 multiValued=true omitNorms=true
 omitTermFrequencyAndPositions=true /
 
 I also have a custom class that extends DefaultSimilarity
 to override the
 idf method.
 
 Query:
 
 str name=qitem_name:octopus seafood OR
 item_description:octopus seafood
 OR item_tags:octopus seafood/str
 str name=sortscore desc,item_ranking
 desc/str
 
 The first 2 results are:
 doc
 float name=score0.5217492/float
 str name=item_nameGrilled Octopus/str
 arr name=item_tagsstrSeafood,
 tapas/str/arr
 /doc
 doc
     float
 name=score0.49379835/float
    str name=item_nameoctopus
 marisco/str
    arr
 name=item_tagsstrAppetizer, Mexican, Seafood,
 food/str/arr
 /doc
 
 Does anyone know why they get a different score? I'm
 expecting them to have
 the same scoring because both matched the two search
 terms.
 
 I checked the debug information and it seems that the
 difference involves
 the fieldNorm values.
 
 1) Grilled Octopus
 0.52174926 = (MATCH) product of:
   0.7826238 = (MATCH) sum of:
     0.4472136 = (MATCH) weight(item_name:octopus
 in 69), product of:
       0.4472136 =
 queryWeight(item_name:octopus), product of:
         1.0 = idf(docFreq=2,
 maxDocs=449)
         0.4472136 = queryNorm
       1.0 = (MATCH)
 fieldWeight(item_name:octopus in 69), product of:
         1.0 =
 tf(termFreq(item_name:octopus)=1)
         1.0 = idf(docFreq=2,
 maxDocs=449)
         1.0 =
 fieldNorm(field=item_name, doc=69)
     0.1118034 = (MATCH) weight(text:seafood in
 69), product of:
       0.4472136 = queryWeight(text:seafood),
 product of:
         1.0 = idf(docFreq=8,
 maxDocs=449)
         0.4472136 = queryNorm
       0.25 = (MATCH)
 fieldWeight(text:seafood in 69), product of:
         1.0 =
 tf(termFreq(text:seafood)=1)
         1.0 = idf(docFreq=8,
 maxDocs=449)
         0.25 = fieldNorm(field=text,
 doc=69)
     0.1118034 = (MATCH) weight(text:seafood in
 69), product of:
       0.4472136 = queryWeight(text:seafood),
 product of:
         1.0 = idf(docFreq=8,
 maxDocs=449)
         0.4472136 = queryNorm
       0.25 = (MATCH)
 fieldWeight(text:seafood in 69), product of:
         1.0 =
 tf(termFreq(text:seafood)=1)
         1.0 = idf(docFreq=8,
 maxDocs=449)
         0.25 = fieldNorm(field=text,
 doc=69)
     0.1118034 = (MATCH) weight(text:seafood in
 69), product of:
       0.4472136 = queryWeight(text:seafood),
 product of:
         1.0 = idf(docFreq=8,
 maxDocs=449)
         0.4472136 = queryNorm
       0.25 = (MATCH)
 fieldWeight(text:seafood in 69), product of:
         1.0 =
 tf(termFreq(text:seafood)=1)
         1.0 = idf(docFreq=8,
 maxDocs=449)
         0.25 = fieldNorm(field=text,
 doc=69)
   0.667 = coord(4/6)
 
 2) octopus marisco
 
 0.49379835 = (MATCH) product of:
   0.7406975 = (MATCH) sum of:
     0.4472136 = (MATCH) weight(item_name:octopus
 in 81), product of:
       0.4472136 =
 queryWeight(item_name:octopus), product of:
         1.0 = idf(docFreq=2,
 maxDocs=449)
         0.4472136 = queryNorm
       1.0 = (MATCH)
 fieldWeight(item_name:octopus in 81), product of:
         1.0 =
 tf(termFreq(item_name:octopus)=1)
         1.0 = idf(docFreq=2,
 maxDocs=449)
         1.0 =
 fieldNorm(field=item_name, doc=81)
     0.09782797 = (MATCH) weight(text:seafood in
 81), product of:
       0.4472136 = queryWeight(text:seafood),
 product of:
         1.0 = idf(docFreq=8,
 maxDocs=449)
         0.4472136 = queryNorm
       0.21875 = (MATCH)
 fieldWeight(text:seafood in 81), product of:
         1.0 =
 tf(termFreq(text:seafood)=1)
         1.0 = idf(docFreq=8,
 maxDocs=449)
         0.21875 = fieldNorm(field=text,
 doc=81)
     0.09782797 = (MATCH) weight(text:seafood in
 81), product of:
       0.4472136 = queryWeight(text:seafood),
 product of:
         1.0 = idf(docFreq=8,
 maxDocs=449)
         0.4472136 = queryNorm
       0.21875 = (MATCH)
 fieldWeight(text:seafood in 81), product of:
         1.0 =
 tf(termFreq(text:seafood)=1)
         1.0 = idf(docFreq=8,
 maxDocs=449)
         0.21875 = fieldNorm(field=text,
 doc=81

Re: Norms - scoring issue

2011-09-15 Thread Adolfo Castro Menna
Hi Ashmet,

You're right. It was related to the text field which is the defaultSearch
field. I also added omitNorms=true in the fieldtype definition and it's now
working as expected

Thanks,
Adolfo.


Norms - scoring issue

2011-09-14 Thread Adolfo Castro Menna
Hi All,

I hope someone could shed some light on the issue I'm facing with solr
3.1.0. It looks like it's computing diferrent fieldNorm values despite my
configuration that aims to ignore it.

   field name=item_name type=textgen indexed=true store=true
omitNorms=true omitTermFrequencyAndPositions=true /
   field name=item_description type=textTight indexed=true
store=true omitNorms=true omitTermFrequencyAndPositions=true /
   field name=item_tags type=text indexed=true stored=true
multiValued=true omitNorms=true omitTermFrequencyAndPositions=true /

I also have a custom class that extends DefaultSimilarity to override the
idf method.

Query:

str name=qitem_name:octopus seafood OR item_description:octopus seafood
OR item_tags:octopus seafood/str
str name=sortscore desc,item_ranking desc/str

The first 2 results are:
doc
float name=score0.5217492/float
str name=item_nameGrilled Octopus/str
arr name=item_tagsstrSeafood, tapas/str/arr
/doc
doc
float name=score0.49379835/float
   str name=item_nameoctopus marisco/str
   arr name=item_tagsstrAppetizer, Mexican, Seafood, food/str/arr
/doc

Does anyone know why they get a different score? I'm expecting them to have
the same scoring because both matched the two search terms.

I checked the debug information and it seems that the difference involves
the fieldNorm values.

1) Grilled Octopus
0.52174926 = (MATCH) product of:
  0.7826238 = (MATCH) sum of:
0.4472136 = (MATCH) weight(item_name:octopus in 69), product of:
  0.4472136 = queryWeight(item_name:octopus), product of:
1.0 = idf(docFreq=2, maxDocs=449)
0.4472136 = queryNorm
  1.0 = (MATCH) fieldWeight(item_name:octopus in 69), product of:
1.0 = tf(termFreq(item_name:octopus)=1)
1.0 = idf(docFreq=2, maxDocs=449)
1.0 = fieldNorm(field=item_name, doc=69)
0.1118034 = (MATCH) weight(text:seafood in 69), product of:
  0.4472136 = queryWeight(text:seafood), product of:
1.0 = idf(docFreq=8, maxDocs=449)
0.4472136 = queryNorm
  0.25 = (MATCH) fieldWeight(text:seafood in 69), product of:
1.0 = tf(termFreq(text:seafood)=1)
1.0 = idf(docFreq=8, maxDocs=449)
0.25 = fieldNorm(field=text, doc=69)
0.1118034 = (MATCH) weight(text:seafood in 69), product of:
  0.4472136 = queryWeight(text:seafood), product of:
1.0 = idf(docFreq=8, maxDocs=449)
0.4472136 = queryNorm
  0.25 = (MATCH) fieldWeight(text:seafood in 69), product of:
1.0 = tf(termFreq(text:seafood)=1)
1.0 = idf(docFreq=8, maxDocs=449)
0.25 = fieldNorm(field=text, doc=69)
0.1118034 = (MATCH) weight(text:seafood in 69), product of:
  0.4472136 = queryWeight(text:seafood), product of:
1.0 = idf(docFreq=8, maxDocs=449)
0.4472136 = queryNorm
  0.25 = (MATCH) fieldWeight(text:seafood in 69), product of:
1.0 = tf(termFreq(text:seafood)=1)
1.0 = idf(docFreq=8, maxDocs=449)
0.25 = fieldNorm(field=text, doc=69)
  0.667 = coord(4/6)

2) octopus marisco

0.49379835 = (MATCH) product of:
  0.7406975 = (MATCH) sum of:
0.4472136 = (MATCH) weight(item_name:octopus in 81), product of:
  0.4472136 = queryWeight(item_name:octopus), product of:
1.0 = idf(docFreq=2, maxDocs=449)
0.4472136 = queryNorm
  1.0 = (MATCH) fieldWeight(item_name:octopus in 81), product of:
1.0 = tf(termFreq(item_name:octopus)=1)
1.0 = idf(docFreq=2, maxDocs=449)
1.0 = fieldNorm(field=item_name, doc=81)
0.09782797 = (MATCH) weight(text:seafood in 81), product of:
  0.4472136 = queryWeight(text:seafood), product of:
1.0 = idf(docFreq=8, maxDocs=449)
0.4472136 = queryNorm
  0.21875 = (MATCH) fieldWeight(text:seafood in 81), product of:
1.0 = tf(termFreq(text:seafood)=1)
1.0 = idf(docFreq=8, maxDocs=449)
0.21875 = fieldNorm(field=text, doc=81)
0.09782797 = (MATCH) weight(text:seafood in 81), product of:
  0.4472136 = queryWeight(text:seafood), product of:
1.0 = idf(docFreq=8, maxDocs=449)
0.4472136 = queryNorm
  0.21875 = (MATCH) fieldWeight(text:seafood in 81), product of:
1.0 = tf(termFreq(text:seafood)=1)
1.0 = idf(docFreq=8, maxDocs=449)
0.21875 = fieldNorm(field=text, doc=81)
0.09782797 = (MATCH) weight(text:seafood in 81), product of:
  0.4472136 = queryWeight(text:seafood), product of:
1.0 = idf(docFreq=8, maxDocs=449)
0.4472136 = queryNorm
  0.21875 = (MATCH) fieldWeight(text:seafood in 81), product of:
1.0 = tf(termFreq(text:seafood)=1)
1.0 = idf(docFreq=8, maxDocs=449)
0.21875 = fieldNorm(field=text, doc=81)
  0.667 = coord(4/6)

Thanks in advance,
Adolfo.


Re: Omitting norms question

2010-03-19 Thread Marc Sturlese

Should I include not omit-norms on any fields that I would like to boost
via a boost-query/function query?
You don't have to set norms to use boost queries or functions. Just have to
set them when you want to boost docs or fields at indexing time.

What about sortable fields? Facetable fields?
You can use both without setting norms aswell.

See what norms are for:
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html#lengthNorm%28java.lang.String,%20int%29


blargy wrote:
 
 Should I include not omit-norms on any fields that I would like to boost
 via a boost-query/function query?
 
 For example I have a created_on field on one of my documents and I would
 like to add some sort of function query to this field when querying. In
 this case does this mean I need to have the norms?
 
 What about sortable fields? Facetable fields?
 
 Thanks!
 

-- 
View this message in context: 
http://old.nabble.com/Omitting-norms-question-tp27950893p27950919.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Omitting norms question

2010-03-19 Thread blargy

Ok so as if I wanted to add boost to fields at indexing time then I should
include norms. On the other hand if I just want to boost at query time then
its quite alright to omit norms. 

Anyone mind explaining what norms are in layman's terms ;)


Marc Sturlese wrote:
 
Should I include not omit-norms on any fields that I would like to boost
via a boost-query/function query?
 You don't have to set norms to use boost queries or functions. Just have
 to set them when you want to boost docs or fields at indexing time.
 
What about sortable fields? Facetable fields?
 You can use both without setting norms aswell.
 
 See what norms are for:
 http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html#lengthNorm%28java.lang.String,%20int%29
 
 
 blargy wrote:
 
 Should I include not omit-norms on any fields that I would like to boost
 via a boost-query/function query?
 
 For example I have a created_on field on one of my documents and I would
 like to add some sort of function query to this field when querying. In
 this case does this mean I need to have the norms?
 
 What about sortable fields? Facetable fields?
 
 Thanks!
 
 
 

-- 
View this message in context: 
http://old.nabble.com/Omitting-norms-question-tp27950893p27950977.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Omitting norms question

2010-03-19 Thread Steven A Rowe
Hi blargy,

Norms are:

- a field-specific multiplicative document scoring factor

- the product of three factors: user-settable 1) field boost and 2) document 
boost (both default to 1.0), along with the 3) field length norm, defined in 
DefaultSimilarity as 1/sqrt(# terms).

- encoded as a positive 8-bit float - range: 6x10^-10 to 7x10^9; accuracy: 
about 7/10's of a decimal digit.  (I have a table of all 256 possible values if 
you're interested.)

Check out the (fuller, less buggy, and way shinier) explanation at the top of 
the javadocs page that Marc sent the link to.

Steve

On 03/19/2010 at 10:51 AM, blargy wrote:
 
 Ok so as if I wanted to add boost to fields at indexing time then I
 should include norms. On the other hand if I just want to boost at query
 time then its quite alright to omit norms.
 
 Anyone mind explaining what norms are in layman's terms ;)
 
 
 Marc Sturlese wrote:
  
Should I include not omit-norms on any fields that I would like to
 boost
 via a boost-query/function query?
  You don't have to set norms to use boost queries or functions. Just
  have to set them when you want to boost docs or fields at indexing time.
  
What about sortable fields? Facetable fields?
  You can use both without setting norms aswell.
  
  See what norms are for:
  
 http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Simila
 rity.html#lengthNorm%28java.lang.String,%20int%29
  
  
  blargy wrote:
   
   Should I include not omit-norms on any fields that I would like to
   boost via a boost-query/function query?
   
   For example I have a created_on field on one of my documents and I
   would like to add some sort of function query to this field when
   querying. In this case does this mean I need to have the norms?
   
   What about sortable fields? Facetable fields?
   
   Thanks!
   
  
  
 
 -- View this message in context: http://old.nabble.com/Omitting-norms-
 question-tp27950893p27950977.html Sent from the Solr - User mailing list
 archive at Nabble.com.




Changing encoding norms and boosting...

2007-03-29 Thread escher2k

This is related to an earlier posting
(http://www.nabble.com/Document-boost-not-as-expected...-tf3476653.html).
I am trying to determine a ranking for users that is between 1 and 1.5.
Because of the way the encoding
norm is stored, if index time boosting is done, everyone gets a score of 1,
1.25 or 1.5. Is there any way
to get around this so that all the values can be retrieved as is (e.g. 1.22,
1.35 etc).

Thanks in advance.
-- 
View this message in context: 
http://www.nabble.com/Changing-encoding-norms-and-boosting...-tf3489245.html#a9744212
Sent from the Solr - User mailing list archive at Nabble.com.