subject:"Boost Strangeness"

Boost Strangeness

2011-06-18 Thread Judioo

WONDERFUL!
Just reporting back.
This document is ACE

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

For explaining what the filters are and how to affect the analyzer.

Erik your statement First, boosting isn't absolute  played on me so
I continued to investigate boosting.

I found this document that ( at last ) explains the dismax logic

http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/

The reason why I was not getting the order I require was due to:
A)  my boost metrics were too close together.
b) similar id's in a document affected the score


It seems that if a partial match is made the product ( a % of the
total boost ) contributes to the documents score.
This meant that one type of document in the index had a higher
aggregate score due to the fact it had all but one of the boosted
fields ( does not have parent_id ) in it and the fields where
populated with content that was *very* similar to the requested id.

for example

required id = b011mg62
X_id = b011mgsf

Due to the partial matching and closeness of the boost ranges this
type of document always aquired a higher score than another document
with just one matching field ( i.e. id field ).

My solution was to increase the value of the fields I wanted to *really* count

id^10 parent_id^5000 brand_container_id^500 

As a result even if there are similar matches in any field the id and
parent_id matches should always receive a higher boost.


This was also useful
http://stackoverflow.com/questions/2179497/adding-date-boosting-to-complex-solr-queries


Thanks for the help!

Re: Boost Strangeness

2011-06-16 Thread Judioo

fascinating

Thank you so much Erik, I'm slowly beginning to understand.

SO I've discovered that by defining 'splitOnNumerics=0' on the filter
class 'solr.WordDelimiterFilterFactory' ( for ONLY the query analyzer ) I
can get *closer* to my required goal!

Now something else odd is occuring.

It only returns 2 results where there is over 70?

Why is that? I can't find were this is explained :(

query

/solr/select?omitNorms=trueq=b006m86ddefType=dismaxqf=id^10%20parent_id^9%20brand_container_id^8%20series_container_id^8%20subseries_container_id^8%20clip_container_id^1%20clip_episode_id^1debugQuery=onfl=type,id,parent_id,brand_container_id,series_container_id,subseries_container_id,clip_episode_id,clip_episode_id,scorewt=jsonindent=onomitNorms=true

output

{

   - -
   responseHeader: {
  - status: 0
  - QTime: 51
  - -
  params: {
 - debugQuery: on
 - fl:
 
type,id,parent_id,brand_container_id,series_container_id,subseries_container_id,clip_episode_id,clip_episode_id,score
 - indent: on
 - q: b006m86d
 - qf: id^10 parent_id^9 brand_container_id^8 series_container_id^8
 subseries_container_id^8 clip_container_id^1 clip_episode_id^1
 - wt: json
 - -
 omitNorms: [
- true
- true
 ]
 - defType: dismax
  }
   }
   - -
   response: {
  - numFound: 2
  - start: 0
  - maxScore: 13.473297
  - -
  docs: [
 - -
 {
- parent_id: 
- id: b006m86d
- type: brand
- score: 13.473297
 }
 - -
 {
- series_container_id: 
- id: b00y1w9h
- type: episode
- brand_container_id: b006m86d
- subseries_container_id: 
- clip_episode_id: 
- score: 11.437143
 }
  ]
   }
   - -
   debug: {
  - rawquerystring: b006m86d
  - querystring: b006m86d
  - parsedquery: +DisjunctionMaxQuery((id:b006m86d^10.0 |
  clip_episode_id:b006m86d | subseries_container_id:b006m86d^8.0 |
  series_container_id:b006m86d^8.0 | clip_container_id:b006m86d |
  brand_container_id:b006m86d^8.0 | parent_id:b006m86d^9.0)) ()
  - parsedquery_toString: +(id:b006m86d^10.0 | clip_episode_id:b006m86d
  | subseries_container_id:b006m86d^8.0 |
series_container_id:b006m86d^8.0 |
  clip_container_id:b006m86d | brand_container_id:b006m86d^8.0 |
  parent_id:b006m86d^9.0) ()
  - -
  explain: {
 - b006m86d:  13.473297 = (MATCH) sum of: 13.473297 = (MATCH) max
 of: 13.473297 = (MATCH) fieldWeight(id:b006m86d in 27636),
product of: 1.0 =
 tf(termFreq(id:b006m86d)=1) 13.473297 = idf(docFreq=2,
maxDocs=783800) 1.0 =
 fieldNorm(field=id, doc=27636) 
 - b00y1w9h:  11.437143 = (MATCH) sum of: 11.437143 = (MATCH) max
 of: 11.437143 = (MATCH) weight(brand_container_id:b006m86d^8.0 in 61),
 product of: 0.82407516 = queryWeight(brand_container_id:b006m86d^8.0),
 product of: 8.0 = boost 13.878762 = idf(docFreq=1, maxDocs=783800)
 0.007422088 = queryNorm 13.878762 = (MATCH)
 fieldWeight(brand_container_id:b006m86d in 61), product of: 1.0 =
 tf(termFreq(brand_container_id:b006m86d)=1) 13.878762 = idf(docFreq=1,
 maxDocs=783800) 1.0 = fieldNorm(field=brand_container_id, doc=61) 
  }
  - QParser: DisMaxQParser
  - altquerystring: null
  - boostfuncs: null
  - -
  timing: {
 - time: 51
 - -
 prepare: {
- time: 6
- -
org.apache.solr.handler.component.QueryComponent: {
   - time: 5
}
- -
org.apache.solr.handler.component.FacetComponent: {
   - time: 0
}
- -
org.apache.solr.handler.component.MoreLikeThisComponent: {
   - time: 0
}
- -
org.apache.solr.handler.component.HighlightComponent: {
   - time: 1
}
- -
org.apache.solr.handler.component.StatsComponent: {
   - time: 0
}
- -
org.apache.solr.handler.component.DebugComponent: {
   - time: 0
}
 }
 - -
 process: {
- time: 45
- -
org.apache.solr.handler.component.QueryComponent: {
   - time: 27
}
- -
org.apache.solr.handler.component.FacetComponent: {
   - time: 0
}
- -
org.apache.solr.handler.component.MoreLikeThisComponent: {
   - time: 0
}
- -
org.apache.solr.handler.component.HighlightComponent: {
   - time: 0
}
- -
org.apache.solr.handler.component.StatsComponent: {

Re: Boost Strangeness

2011-06-16 Thread Erick Erickson

Right, if you've only changed WordDelimiterFilterFactory in the query, then
then tokens you're analyzing may be split up. Try running some of the
terms through the admin/analysis page Unless you have
catenateAll=1, in the definition, the whole term won't be there

It becomes a question of why you even want WDFF in there in the first
place, do you ever want to split these fields up this way? Maybe start
by just taking it out completely?

Best
Erick

On Thu, Jun 16, 2011 at 9:55 AM, Judioo cont...@judioo.com wrote:
 fascinating

 Thank you so much Erik, I'm slowly beginning to understand.

 SO I've discovered that by defining 'splitOnNumerics=0' on the filter
 class 'solr.WordDelimiterFilterFactory' ( for ONLY the query analyzer ) I
 can get *closer* to my required goal!

 Now something else odd is occuring.

 It only returns 2 results where there is over 70?

 Why is that? I can't find were this is explained :(

 query

 /solr/select?omitNorms=trueq=b006m86ddefType=dismaxqf=id^10%20parent_id^9%20brand_container_id^8%20series_container_id^8%20subseries_container_id^8%20clip_container_id^1%20clip_episode_id^1debugQuery=onfl=type,id,parent_id,brand_container_id,series_container_id,subseries_container_id,clip_episode_id,clip_episode_id,scorewt=jsonindent=onomitNorms=true

 output

 {

   - -
   responseHeader: {
      - status: 0
      - QTime: 51
      - -
      params: {
         - debugQuery: on
         - fl:
         
 type,id,parent_id,brand_container_id,series_container_id,subseries_container_id,clip_episode_id,clip_episode_id,score
         - indent: on
         - q: b006m86d
         - qf: id^10 parent_id^9 brand_container_id^8 series_container_id^8
         subseries_container_id^8 clip_container_id^1 clip_episode_id^1
         - wt: json
         - -
         omitNorms: [
            - true
            - true
         ]
         - defType: dismax
      }
   }
   - -
   response: {
      - numFound: 2
      - start: 0
      - maxScore: 13.473297
      - -
      docs: [
         - -
         {
            - parent_id: 
            - id: b006m86d
            - type: brand
            - score: 13.473297
         }
         - -
         {
            - series_container_id: 
            - id: b00y1w9h
            - type: episode
            - brand_container_id: b006m86d
            - subseries_container_id: 
            - clip_episode_id: 
            - score: 11.437143
         }
      ]
   }
   - -
   debug: {
      - rawquerystring: b006m86d
      - querystring: b006m86d
      - parsedquery: +DisjunctionMaxQuery((id:b006m86d^10.0 |
      clip_episode_id:b006m86d | subseries_container_id:b006m86d^8.0 |
      series_container_id:b006m86d^8.0 | clip_container_id:b006m86d |
      brand_container_id:b006m86d^8.0 | parent_id:b006m86d^9.0)) ()
      - parsedquery_toString: +(id:b006m86d^10.0 | clip_episode_id:b006m86d
      | subseries_container_id:b006m86d^8.0 |
 series_container_id:b006m86d^8.0 |
      clip_container_id:b006m86d | brand_container_id:b006m86d^8.0 |
      parent_id:b006m86d^9.0) ()
      - -
      explain: {
         - b006m86d:  13.473297 = (MATCH) sum of: 13.473297 = (MATCH) max
         of: 13.473297 = (MATCH) fieldWeight(id:b006m86d in 27636),
 product of: 1.0 =
         tf(termFreq(id:b006m86d)=1) 13.473297 = idf(docFreq=2,
 maxDocs=783800) 1.0 =
         fieldNorm(field=id, doc=27636) 
         - b00y1w9h:  11.437143 = (MATCH) sum of: 11.437143 = (MATCH) max
         of: 11.437143 = (MATCH) weight(brand_container_id:b006m86d^8.0 in 61),
         product of: 0.82407516 = queryWeight(brand_container_id:b006m86d^8.0),
         product of: 8.0 = boost 13.878762 = idf(docFreq=1, maxDocs=783800)
         0.007422088 = queryNorm 13.878762 = (MATCH)
         fieldWeight(brand_container_id:b006m86d in 61), product of: 1.0 =
         tf(termFreq(brand_container_id:b006m86d)=1) 13.878762 = idf(docFreq=1,
         maxDocs=783800) 1.0 = fieldNorm(field=brand_container_id, doc=61) 
      }
      - QParser: DisMaxQParser
      - altquerystring: null
      - boostfuncs: null
      - -
      timing: {
         - time: 51
         - -
         prepare: {
            - time: 6
            - -
            org.apache.solr.handler.component.QueryComponent: {
               - time: 5
            }
            - -
            org.apache.solr.handler.component.FacetComponent: {
               - time: 0
            }
            - -
            org.apache.solr.handler.component.MoreLikeThisComponent: {
               - time: 0
            }
            - -
            org.apache.solr.handler.component.HighlightComponent: {
               - time: 1
            }
            - -
            org.apache.solr.handler.component.StatsComponent: {
               - time: 0
            }
            - -
            org.apache.solr.handler.component.DebugComponent: {
               - time: 0
            }
         }
         - -
         process: {
            - time: 45
            - -

Boost Strangeness

2011-06-15 Thread Judioo

Hi

I'm confused about exactly how boosts relevancy scores work.

Apologies if I am violating this groups etiquette but I could not find
solr's paste bin anywhere.

I have 2 document types but want to return any documents where the requested
ID appears. The ID appears in multiple attributes but I want to boost
results based on which attribute contains the ID.

so my query is

q=id:b007vty6 parent_id:b007vty6 brand_container_id:b007vty6
series_container_id:b007vty6 subseries_container_id:b007vty6
clip_container_id:b007vty6 clip_episode_id:b007vty6

and I use qf to boost fields

qf=id^10 parent_id^9 brand_container_id^8 series_container_id^8
subseries_container_id^8 clip_container_id^1 clip_episode_id^1


I expect any document with the following id:b007vty6 to be returned 1st (
with the highest score ) yet this is not the case. Can anyone explain why
this is? Could it be that


extra info below:

complete URL

/solr/select/?q=id:b007vty6%20parent_id:b007vty6%20brand_container_id:b007vty6%20series_container_id:b007vty6%20subseries_container_id:b007vty6%20clip_container_id:b007vty6%20clip_episode_id:b007vty6start=0rows=10wt=jsonindent=ondebugQuery=onfl=id,parent_id,brand_container_id,series_container_id,subseries_container_id,clip_episode_id,clip_episode_id,scoreqf=id^10%20parent_id^9%20brand_container_id^8%20series_container_id^8%20subseries_container_id^8%20clip_container_id^1%20clip_episode_id^1

results

{

   - -
   responseHeader: {
  - status: 0
  - QTime: 12
  - -
  params: {
 - debugQuery: on
 - fl:
 
id,parent_id,brand_container_id,series_container_id,subseries_container_id,clip_episode_id,clip_episode_id,score
 - indent: on
 - start: 0
 - q: id:b007vty6 parent_id:b007vty6 brand_container_id:b007vty6
 series_container_id:b007vty6 subseries_container_id:b007vty6
 clip_container_id:b007vty6 clip_episode_id:b007vty6
 - qf: id^10 parent_id^9 brand_container_id^8 series_container_id^8
 subseries_container_id^8 clip_container_id^1 clip_episode_id^1
 - wt: json
 - rows: 10
  }
   }
   - -
   response: {
  - numFound: 2
  - start: 0
  - maxScore: 1.5543144
  - -
  docs: [
 - -
 {
- series_container_id: b007vm94
- id: b007vsvm
- brand_container_id: b007hhk5
- subseries_container_id: b007vty6
- clip_episode_id: 
- score: 1.5543144
 }
 - -
 {
- parent_id: b007vm94
- id: b007vty6
- score: 0.3014368
 }
  ]
   }
   - -
   debug: {
  - rawquerystring: id:b007vty6 parent_id:b007vty6
  brand_container_id:b007vty6 series_container_id:b007vty6
  subseries_container_id:b007vty6 clip_container_id:b007vty6
  clip_episode_id:b007vty6
  - querystring: id:b007vty6 parent_id:b007vty6
  brand_container_id:b007vty6 series_container_id:b007vty6
  subseries_container_id:b007vty6 clip_container_id:b007vty6
  clip_episode_id:b007vty6
  - parsedquery: id:b007vty6 PhraseQuery(parent_id:b 007 vty 6)
  PhraseQuery(brand_container_id:b 007 vty 6)
  PhraseQuery(series_container_id:b 007 vty 6)
  PhraseQuery(subseries_container_id:b 007 vty 6)
  PhraseQuery(clip_container_id:b 007 vty 6)
PhraseQuery(clip_episode_id:b
  007 vty 6)
  - parsedquery_toString: id:b007vty6 parent_id:b 007 vty 6
  brand_container_id:b 007 vty 6 series_container_id:b 007 vty 6
  subseries_container_id:b 007 vty 6 clip_container_id:b 007 vty 6
  clip_episode_id:b 007 vty 6
  - -
  explain: {
 - b007vsvm:  1.5543144 = (MATCH) product of: 10.8802 = (MATCH) sum
 of: 10.8802 = (MATCH) weight(subseries_container_id:b 007
vty 6 in 39526),
 product of: 0.43911988 =
queryWeight(subseries_container_id:b 007 vty 6),
 product of: 49.55458 = idf(subseries_container_id: b=547
007=31 vty=1 6=87)
 0.008861338 = queryNorm 24.77729 =
fieldWeight(subseries_container_id:b 007
 vty 6 in 39526), product of: 1.0 = tf(phraseFreq=1.0) 49.55458 =
 idf(subseries_container_id: b=547 007=31 vty=1 6=87) 0.5 =
 fieldNorm(field=subseries_container_id, doc=39526) 0.14285715
= coord(1/7) 
 - b007vty6:  0.3014368 = (MATCH) product of: 2.1100576 = (MATCH)
 sum of: 2.1100576 = (MATCH) weight(id:b007vty6 in 39512), product of:
 0.13674039 = queryWeight(id:b007vty6), product of: 15.431123 =
 idf(docFreq=1, maxDocs=3701577) 0.008861338 = queryNorm
15.431123 = (MATCH)
 fieldWeight(id:b007vty6 in 39512), product of: 1.0 =
 tf(termFreq(id:b007vty6)=1) 15.431123 = idf(docFreq=1,
maxDocs=3701577) 1.0
 = fieldNorm(field=id, doc=39512) 0.14285715 = coord(1/7) 
  }
  - QParser: LuceneQParser
  - -
  timing: {
 - time: 12
 - -
 prepare: {
- time: 3
- -

Re: Boost Strangeness

2011-06-15 Thread Ahmet Arslan

 I have 2 document types but want to return any documents
 where the requested
 ID appears. The ID appears in multiple attributes but I
 want to boost
 results based on which attribute contains the ID.
 
 so my query is
 
 q=id:b007vty6 parent_id:b007vty6
 brand_container_id:b007vty6
 series_container_id:b007vty6
 subseries_container_id:b007vty6
 clip_container_id:b007vty6 clip_episode_id:b007vty6
 
 and I use qf to boost fields
 
 qf=id^10 parent_id^9 brand_container_id^8
 series_container_id^8
 subseries_container_id^8 clip_container_id^1
 clip_episode_id^1
 

There is a misunderstanding here. qf parameter is specific to (e)dismax query 
parser plugin. For more information about it please see:

http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/

Your query string can be something like this:

defType=dismaxq=b007vty6qf=id^10 parent_id^9 brand_container_id^8 ...

It automatically expands your simple word query to multiple fields.
defType=dismax is a must to enable it, either in URL or in solrconfig.xml 
(defaults section).

Re: Boost Strangeness

2011-06-15 Thread Judioo

Apologies
I have tried that method as well.

/solr/select/?q=b007vty6defType=dismaxqf=id^10%20parent_id^9%20brand_container_id^8%20series_container_id^8%20subseries_container_id^8%20clip_container_id^1%20clip_episode_id^1debugQuery=onfl=id,parent_id,brand_container_id,series_container_id,subseries_container_id,clip_episode_id,clip_episode_id,scorewt=jsonindent=on


same result ( just higher scores ). It's almost as if  partial matches on
brand|series_container_id and id are being considered in the 1st document.
Surely this can't be right / expected?

{

   - -
   responseHeader: {
  - status: 0
  - QTime: 13
  - -
  params: {
 - debugQuery: on
 - fl:
 
id,parent_id,brand_container_id,series_container_id,subseries_container_id,clip_episode_id,clip_episode_id,score
 - indent: on
 - q: b007vty6
 - qf: id^10 parent_id^9 brand_container_id^8 series_container_id^8
 subseries_container_id^8 clip_container_id^1 clip_episode_id^1
 - wt: json
 - defType: dismax
  }
   }
   - -
   response: {
  - numFound: 2
  - start: 0
  - maxScore: 21.138214
  - -
  docs: [
 - -
 {
- series_container_id: b007vm94
- id: b007vsvm
- brand_container_id: b007hhk5
- subseries_container_id: b007vty6
- clip_episode_id: 
- score: 21.138214
 }
 - -
 {
- parent_id: b007vm94
- id: b007vty6
- score: 5.1243143
 }
  ]
   }
   - -
   debug: {
  - rawquerystring: b007vty6
  - querystring: b007vty6
  - parsedquery: +DisjunctionMaxQuery((id:b007vty6^10.0 |
  clip_episode_id:b 007 vty 6 | subseries_container_id:b 007
vty 6^8.0 |
  series_container_id:b 007 vty 6^8.0 | clip_container_id:b 007 vty 6 |
  brand_container_id:b 007 vty 6^8.0 | parent_id:b 007 vty 6^9.0)) ()
  - parsedquery_toString: +(id:b007vty6^10.0 | clip_episode_id:b 007
  vty 6 | subseries_container_id:b 007 vty 6^8.0 |
series_container_id:b
  007 vty 6^8.0 | clip_container_id:b 007 vty 6 |
brand_container_id:b 007
  vty 6^8.0 | parent_id:b 007 vty 6^9.0) ()
  - -
  explain: {
 - b007vsvm:  21.138214 = (MATCH) sum of: 21.138214 = (MATCH) max
 of: 21.138214 = (MATCH) weight(subseries_container_id:b 007
vty 6^8.0 in
 39526), product of: 0.85312855 =
queryWeight(subseries_container_id:b 007
 vty 6^8.0), product of: 8.0 = boost 49.55458 =
idf(subseries_container_id:
 b=547 007=31 vty=1 6=87) 0.0021519922 = queryNorm 24.77729 =
 fieldWeight(subseries_container_id:b 007 vty 6 in 39526),
product of: 1.0
 = tf(phraseFreq=1.0) 49.55458 = idf(subseries_container_id:
b=547 007=31
 vty=1 6=87) 0.5 = fieldNorm(field=subseries_container_id, doc=39526) 
 - b007vty6:  5.1243143 = (MATCH) sum of: 5.1243143 = (MATCH) max
 of: 5.1243143 = (MATCH) weight(id:b007vty6^10.0 in 39512), product of:
 0.33207658 = queryWeight(id:b007vty6^10.0), product of: 10.0 = boost
 15.431123 = idf(docFreq=1, maxDocs=3701577) 0.0021519922 = queryNorm
 15.431123 = (MATCH) fieldWeight(id:b007vty6 in 39512),
product of: 1.0 =
 tf(termFreq(id:b007vty6)=1) 15.431123 = idf(docFreq=1,
maxDocs=3701577) 1.0
 = fieldNorm(field=id, doc=39512) 
  }
  - QParser: DisMaxQParser
  - altquerystring: null
  - boostfuncs: null
  - -
  timing: {
 - time: 13
 - -
 prepare: {
- time: 3
- -
org.apache.solr.handler.component.QueryComponent: {
   - time: 3
}
- -
org.apache.solr.handler.component.FacetComponent: {
   - time: 0
}
- -
org.apache.solr.handler.component.MoreLikeThisComponent: {
   - time: 0
}
- -
org.apache.solr.handler.component.HighlightComponent: {
   - time: 0
}
- -
org.apache.solr.handler.component.StatsComponent: {
   - time: 0
}
- -
org.apache.solr.handler.component.DebugComponent: {
   - time: 0
}
 }
 - -
 process: {
- time: 10
- -
org.apache.solr.handler.component.QueryComponent: {
   - time: 0
}
- -
org.apache.solr.handler.component.FacetComponent: {
   - time: 0
}
- -
org.apache.solr.handler.component.MoreLikeThisComponent: {
   - time: 0
}
- -
org.apache.solr.handler.component.HighlightComponent: {
   - time: 0
}
- -
org.apache.solr.handler.component.StatsComponent: {
   - time: 0

Re: Boost Strangeness

2011-06-15 Thread Ahmet Arslan

 /solr/select/?q=b007vty6defType=dismaxqf=id^10%20parent_id^9%20brand_container_id^8%20series_container_id^8%20subseries_container_id^8%20clip_container_id^1%20clip_episode_id^1debugQuery=onfl=id,parent_id,brand_container_id,series_container_id,subseries_container_id,clip_episode_id,clip_episode_id,scorewt=jsonindent=on
 
 
 same result ( just higher scores ). It's almost as if 
 partial matches on
 brand|series_container_id and id are being considered in
 the 1st document.
 Surely this can't be right / expected?

What is your fieldType definition? Don't you think it is better to use string 
type which is not tokenized?

Re: Boost Strangeness

2011-06-15 Thread Judioo

   dynamicField name=*_id  type=textindexed=true  stored=true/

so all attributes except 'id' are of type text.

I didn't know that about the string type. So is my problem as described (
that partial matches are contributing to the calculation ) and does defining
the filed type as string solve this problem.

Or is my understanding completely incorrect?

Thanks in advance

On 15 June 2011 12:08, Ahmet Arslan iori...@yahoo.com wrote:

 
 /solr/select/?q=b007vty6defType=dismaxqf=id^10%20parent_id^9%20brand_container_id^8%20series_container_id^8%20subseries_container_id^8%20clip_container_id^1%20clip_episode_id^1debugQuery=onfl=id,parent_id,brand_container_id,series_container_id,subseries_container_id,clip_episode_id,clip_episode_id,scorewt=jsonindent=on
 
 
  same result ( just higher scores ). It's almost as if
  partial matches on
  brand|series_container_id and id are being considered in
  the 1st document.
  Surely this can't be right / expected?

 What is your fieldType definition? Don't you think it is better to use
 string type which is not tokenized?

Re: Boost Strangeness

2011-06-15 Thread Judioo

String also does not seem to accept spaces. currently the _id fields can
contain multiple ids ( using as a multiType alternative ). This is why I
used the text type.

On 15 June 2011 12:16, Judioo cont...@judioo.com wrote:

dynamicField name=*_id  type=textindexed=true
 stored=true/

 so all attributes except 'id' are of type text.

 I didn't know that about the string type. So is my problem as described (
 that partial matches are contributing to the calculation ) and does defining
 the filed type as string solve this problem.

 Or is my understanding completely incorrect?

 Thanks in advance


 On 15 June 2011 12:08, Ahmet Arslan iori...@yahoo.com wrote:

 
 /solr/select/?q=b007vty6defType=dismaxqf=id^10%20parent_id^9%20brand_container_id^8%20series_container_id^8%20subseries_container_id^8%20clip_container_id^1%20clip_episode_id^1debugQuery=onfl=id,parent_id,brand_container_id,series_container_id,subseries_container_id,clip_episode_id,clip_episode_id,scorewt=jsonindent=on
 
 
  same result ( just higher scores ). It's almost as if
  partial matches on
  brand|series_container_id and id are being considered in
  the 1st document.
  Surely this can't be right / expected?

 What is your fieldType definition? Don't you think it is better to use
 string type which is not tokenized?

Re: Boost Strangeness

2011-06-15 Thread Erick Erickson

First off, you didn't violate groups ettiquette. In fact, yours was
one of the better first posts in terms or providing enough information
for us to actually help!

A very useful page is the admin/analysis page to see how the
analysis chain works. For instance, if you haven't changed the
field type (i.e. fieldType name=text) that your input is
being broken up by WordDelimiterFilterFactory. Be sure to check
the verbose checkbox and enter text in both the query and
index boxes!

Here's an invaluable page, though do note that it's not exhaustive:

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

But on to your problem:

First, boosting isn't absolute, boosting terms just tends to
bubble things up, you have to experiment with various weights

To get the full comparison for both documents you're curious about,
try using explainOther. see:
http://wiki.apache.org/solr/SolrRelevancyFAQ#Why_doesn.27t_document_id:juggernaut_appear_in_the_top_10_results_for_my_query

If you use that against the two docs in question, you should
see (although it's a hard read!) the reason the docs got
their relative scores.

Finally, your next e-mail hints at what's happening. If you're
putting multiple tokens in some of these fields, the length
normalization may be causing the matches to score lower. You can
try disabling those calculations (omitNorms=true in your field definition).
See:
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr

String types accept spaces just fine, but you might want to define
the fields with 'multiValued=true ' and index each as a separate
field (note that won't work with a field that's also your uniqueKey).

Best
Erick

On Wed, Jun 15, 2011 at 7:16 AM, Judioo cont...@judioo.com wrote:
dynamicField name=*_id type=text indexed=true stored=true/

so all attributes except 'id' are of type text.

I didn't know that about the string type. So is my problem as described (
that partial matches are contributing to the calculation ) and does defining
the filed type as string solve this problem.

Or is my understanding completely incorrect?

Thanks in advance

On 15 June 2011 12:08, Ahmet Arslan iori...@yahoo.com wrote:

/solr/select/?q=b007vty6defType=dismaxqf=id^10%20parent_id^9%20brand_container_id^8%20series_container_id^8%20subseries_container_id^8%20clip_container_id^1%20clip_episode_id^1debugQuery=onfl=id,parent_id,brand_container_id,series_container_id,subseries_container_id,clip_episode_id,clip_episode_id,scorewt=jsonindent=on

same result ( just higher scores ). It's almost as if
partial matches on
brand|series_container_id and id are being considered in
the 1st document.
Surely this can't be right / expected?

What is your fieldType definition? Don't you think it is better to use
string type which is not tokenized?

Boost Strangeness

Re: Boost Strangeness

Re: Boost Strangeness

Boost Strangeness

Re: Boost Strangeness

Re: Boost Strangeness

Re: Boost Strangeness

Re: Boost Strangeness

Re: Boost Strangeness

Re: Boost Strangeness

10 matches

Site Navigation

Mail list logo

Footer information