Re: Trouble boosting a field -solved-

2017-01-18 Thread Erick Erickson
bq: Which might be the same as saying nothing matched

Right, a score of zero for a doc means it didn't match the query.

It can be useful to specify &debug.explainOther can show the scoring
for an arbitrary doc, even one with a zero score.

https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters

Best,
Erick

On Wed, Jan 18, 2017 at 12:53 AM, Tom Chiverton  wrote:

> I 'solved' this by removing some of the 'AND' from my full query. AND
> should be optional but have no effect if there, right ? But for me it was
> forcing the score to 0.
>
>
> Which might be the same as saying nothing matched ?
>
>
> Tom
>
> On 13/01/17 15:10, Tom Chiverton wrote:
>
> I have a few hundred documents with title and content fields.
>
> I want a match in title to trump matches in content. If I search for
> "connected vehicle" then a news article that has that in the content
> shouldn't be ranked higher than the page with that in the title is
> essentially what I want.
>
> I have tried dismax with qf=title^2 as well as several other variants with
> the standard query parser (like q="title:"foo"^2 OR content:"foo") but
> documents without the search term in the title still come out before those
> with the term in the title when ordered by score.
>
> Is there something I am missing ?
>
> From the docs, something like q=title:"connected vehicle"^2 OR
> content:"connected vehicle" should have worked ? Even using ^100 didn't
> help.
>
> I tried with the dismax parser using
>
>   "q": "Connected Vehicle",
>   "defType": "dismax",
>   "indent": "true",
>   "qf": "title^2000 content",
>   "pf": "pf=title^4000 content^2",
>   "sort": "score desc",
>   "wt": "json",
> but that was not better. if I remove content from pf/qf then documents seem 
> to rank correctly.
>
> Example query and results (content omitted) :
> http://pastebin.com/5EhrRJP8 with
> managed-schema http://pastebin.com/mdraWQWE
>
> --
> *Tom Chiverton*
> Lead Developer
> e:  t...@extravision.com
> p:  0161 817 2922
> t:  @extravision 
> w:  www.extravision.com
> [image: Extravision - email worth seeing] 
> Registered in the UK at: 107 Timber Wharf, 33 Worsley Street, Manchester,
> M15 4LD.
> Company Reg No: 0‌‌5017214 VAT: GB 8‌‌24 5386 19
>
> This e-mail is intended solely for the person to whom it is addressed and
> may contain confidential or privileged information.
> Any views or opinions presented in this e-mail are solely of the author
> and do not necessarily represent those of Extravision Ltd.
>
>
>


Re: Trouble boosting a field -solved-

2017-01-18 Thread Tom Chiverton
I 'solved' this by removing some of the 'AND' from my full query. AND 
should be optional but have no effect if there, right ? But for me it 
was forcing the score to 0.



Which might be the same as saying nothing matched ?


Tom


On 13/01/17 15:10, Tom Chiverton wrote:

I have a few hundred documents with title and content fields.

I want a match in title to trump matches in content. If I search for 
"connected vehicle" then a news article that has that in the content 
shouldn't be ranked higher than the page with that in the title is 
essentially what I want.


I have tried dismax with qf=title^2 as well as several other variants 
with the standard query parser (like q="title:"foo"^2 OR 
content:"foo") but documents without the search term in the title 
still come out before those with the term in the title when ordered by 
score.


Is there something I am missing ?

From the docs, something like q=title:"connected vehicle"^2 OR 
content:"connected vehicle" should have worked ? Even using ^100 
didn't help.


I tried with the dismax parser using

|"q": "Connected Vehicle", "defType": "dismax", "indent": "true", "qf": 
"title^2000 content", "pf": "pf=title^4000 content^2", "sort": "score 
desc", "wt": "json", but that was not better. if I remove content from 
pf/qf then documents seem to rank correctly. |
Example query and results (content omitted) : 
http://pastebin.com/5EhrRJP8 with managed-schema 
http://pastebin.com/mdraWQWE


--
*Tom Chiverton*
Lead Developer
e:  t...@extravision.com
p:  0161 817 2922
t:  @extravision 
w:  www.extravision.com

Extravision - email worth seeing 
Registered in the UK at: 107 Timber Wharf, 33 Worsley Street, 
Manchester, M15 4LD.

Company Reg No: 0‌‌5017214 VAT: GB 8‌‌24 5386 19

This e-mail is intended solely for the person to whom it is addressed 
and may contain confidential or privileged information.
Any views or opinions presented in this e-mail are solely of the 
author and do not necessarily represent those of Extravision Ltd.






Re: Trouble boosting a field

2017-01-16 Thread Alan Woodward
Just accessible from your browser, so if you have a machine that’s inside your 
firewall but can see the outside world then it will work.

Alan Woodward
www.flax.co.uk


> On 16 Jan 2017, at 09:47, Tom Chiverton  wrote:
> 
> Ohh, that's handy ! But it needs Solr/ElasticSearch to be publicly accessible 
> ?
> 
> 
> On 14/01/17 09:23, Alan Woodward wrote:
>> http://splainer.io/  from the gents at 
>> OpenSourceConnections is pretty good for this sort of thing, I find…
>> 
>> Alan Woodward
>> www.flax.co.uk
>> 
>> 
>>> On 13 Jan 2017, at 16:35, Tom Chiverton  wrote:
>>> 
>>> Well, I've tried much larger values than 8, and it still doesn't seem to do 
>>> the job ?
>>> 
>>> For now, assume my users are searching for exact sub strings of a real 
>>> title.
>>> 
>>> Tom
>>> 
>>> 
>>> On 13/01/17 16:22, Walter Underwood wrote:
 I use a boost of 8 for title with no boost on the content. Both Infoseek 
 and Inktomi settled on the 8X boost, getting there with completely 
 different methodologies.
 
 You might not want the title to completely trump the content. That causes 
 some odd anomalies. If someone searches for “ice age 2”, do you really 
 want every title with “2” to come before “ice age two”? Or a search for 
 “steve jobs” to return every article with “job” or “jobs” in the title 
 first?
 
 Also, use “edismax”, not “dismax”. Dismax was obsolete in Solr 3.x, five 
 years ago.
 
 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)
 
 
> On Jan 13, 2017, at 7:10 AM, Tom Chiverton  wrote:
> 
> I have a few hundred documents with title and content fields.
> 
> I want a match in title to trump matches in content. If I search for 
> "connected vehicle" then a news article that has that in the content 
> shouldn't be ranked higher than the page with that in the title is 
> essentially what I want.
> 
> I have tried dismax with qf=title^2 as well as several other variants 
> with the standard query parser (like q="title:"foo"^2 OR content:"foo") 
> but documents without the search term in the title still come out before 
> those with the term in the title when ordered by score.
> 
> Is there something I am missing ?
> 
> From the docs, something like q=title:"connected vehicle"^2 OR 
> content:"connected vehicle" should have worked ? Even using ^100 didn't 
> help.
> 
> I tried with the dismax parser using
> 
>   "q": "Connected Vehicle",
>   "defType": "dismax",
>   "indent": "true",
>   "qf": "title^2000 content",
>   "pf": "pf=title^4000 content^2",
>   "sort": "score desc",
>   "wt": "json",
> 
> but that was not better. if I remove content from pf/qf then documents 
> seem to rank correctly.
> Example query and results (content omitted) : 
> http://pastebin.com/5EhrRJP8  with 
> managed-schema http://pastebin.com/mdraWQWE 
> 
> -- 
> 
> 
> 
> Tom Chiverton
> Lead Developer
> 
> e: t...@extravision.com 
> 
> p:0161 817 2922
> t:@extravision 
> w: www.extravision.com 
> 
> 
>  
> 
> Registered in the UK at: 107 Timber Wharf, 33 Worsley Street, Manchester, 
> M15 4LD.
> Company Reg No: 0‌‌5017214 VAT: GB 8‌‌24 5386 19
> 
> This e-mail is intended solely for the person to whom it is addressed and 
> may contain confidential or privileged information.
> Any views or opinions presented in this e-mail are solely of the author 
> and do not necessarily represent those of Extravision Ltd.
> 
 __
 This email has been scanned by the Symantec Email Security.cloud service.
 For more information please visit http://www.symanteccloud.com
 __
>> 
>> __
>> This email has been scanned by the Symantec Email Security.cloud service.
>> For more information please visit http://www.symanteccloud.com
>> __
> 



Re: Trouble boosting a field

2017-01-16 Thread Tom Chiverton
Ohh, that's handy ! But it needs Solr/ElasticSearch to be publicly 
accessible ?



On 14/01/17 09:23, Alan Woodward wrote:

http://splainer.io/  from the gents at 
OpenSourceConnections is pretty good for this sort of thing, I find…

Alan Woodward
www.flax.co.uk



On 13 Jan 2017, at 16:35, Tom Chiverton  wrote:

Well, I've tried much larger values than 8, and it still doesn't seem to do the 
job ?

For now, assume my users are searching for exact sub strings of a real title.

Tom


On 13/01/17 16:22, Walter Underwood wrote:

I use a boost of 8 for title with no boost on the content. Both Infoseek and 
Inktomi settled on the 8X boost, getting there with completely different 
methodologies.

You might not want the title to completely trump the content. That causes some 
odd anomalies. If someone searches for “ice age 2”, do you really want every 
title with “2” to come before “ice age two”? Or a search for “steve jobs” to 
return every article with “job” or “jobs” in the title first?

Also, use “edismax”, not “dismax”. Dismax was obsolete in Solr 3.x, five years 
ago.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



On Jan 13, 2017, at 7:10 AM, Tom Chiverton  wrote:

I have a few hundred documents with title and content fields.

I want a match in title to trump matches in content. If I search for "connected 
vehicle" then a news article that has that in the content shouldn't be ranked higher 
than the page with that in the title is essentially what I want.

I have tried dismax with qf=title^2 as well as several other variants with the standard query parser 
(like q="title:"foo"^2 OR content:"foo") but documents without the search term 
in the title still come out before those with the term in the title when ordered by score.

Is there something I am missing ?

 From the docs, something like q=title:"connected vehicle"^2 OR content:"connected 
vehicle" should have worked ? Even using ^100 didn't help.

I tried with the dismax parser using

   "q": "Connected Vehicle",
   "defType": "dismax",
   "indent": "true",
   "qf": "title^2000 content",
   "pf": "pf=title^4000 content^2",
   "sort": "score desc",
   "wt": "json",

but that was not better. if I remove content from pf/qf then documents seem to 
rank correctly.
Example query and results (content omitted) : http://pastebin.com/5EhrRJP8 
 with managed-schema http://pastebin.com/mdraWQWE 


--



Tom Chiverton
Lead Developer

e:   t...@extravision.com 

p:  0161 817 2922
t:  @extravision 
w:   www.extravision.com 


 

Registered in the UK at: 107 Timber Wharf, 33 Worsley Street, Manchester, M15 
4LD.
Company Reg No: 0‌‌5017214 VAT: GB 8‌‌24 5386 19

This e-mail is intended solely for the person to whom it is addressed and may 
contain confidential or privileged information.
Any views or opinions presented in this e-mail are solely of the author and do 
not necessarily represent those of Extravision Ltd.


__
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
__


__
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
__




Re: Trouble boosting a field

2017-01-14 Thread Alan Woodward
http://splainer.io/  from the gents at 
OpenSourceConnections is pretty good for this sort of thing, I find…

Alan Woodward
www.flax.co.uk


> On 13 Jan 2017, at 16:35, Tom Chiverton  wrote:
> 
> Well, I've tried much larger values than 8, and it still doesn't seem to do 
> the job ?
> 
> For now, assume my users are searching for exact sub strings of a real title.
> 
> Tom
> 
> 
> On 13/01/17 16:22, Walter Underwood wrote:
>> I use a boost of 8 for title with no boost on the content. Both Infoseek and 
>> Inktomi settled on the 8X boost, getting there with completely different 
>> methodologies.
>> 
>> You might not want the title to completely trump the content. That causes 
>> some odd anomalies. If someone searches for “ice age 2”, do you really want 
>> every title with “2” to come before “ice age two”? Or a search for “steve 
>> jobs” to return every article with “job” or “jobs” in the title first?
>> 
>> Also, use “edismax”, not “dismax”. Dismax was obsolete in Solr 3.x, five 
>> years ago.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Jan 13, 2017, at 7:10 AM, Tom Chiverton  wrote:
>>> 
>>> I have a few hundred documents with title and content fields.
>>> 
>>> I want a match in title to trump matches in content. If I search for 
>>> "connected vehicle" then a news article that has that in the content 
>>> shouldn't be ranked higher than the page with that in the title is 
>>> essentially what I want.
>>> 
>>> I have tried dismax with qf=title^2 as well as several other variants with 
>>> the standard query parser (like q="title:"foo"^2 OR content:"foo") but 
>>> documents without the search term in the title still come out before those 
>>> with the term in the title when ordered by score.
>>> 
>>> Is there something I am missing ?
>>> 
>>> From the docs, something like q=title:"connected vehicle"^2 OR 
>>> content:"connected vehicle" should have worked ? Even using ^100 didn't 
>>> help.
>>> 
>>> I tried with the dismax parser using
>>> 
>>>   "q": "Connected Vehicle",
>>>   "defType": "dismax",
>>>   "indent": "true",
>>>   "qf": "title^2000 content",
>>>   "pf": "pf=title^4000 content^2",
>>>   "sort": "score desc",
>>>   "wt": "json",
>>> 
>>> but that was not better. if I remove content from pf/qf then documents seem 
>>> to rank correctly.
>>> Example query and results (content omitted) : http://pastebin.com/5EhrRJP8 
>>>  with managed-schema 
>>> http://pastebin.com/mdraWQWE 
>>> 
>>> -- 
>>> 
>>> 
>>> 
>>> Tom Chiverton
>>> Lead Developer
>>> 
>>> e:   t...@extravision.com 
>>> 
>>> p:  0161 817 2922
>>> t:  @extravision 
>>> w:   www.extravision.com 
>>> 
>>> 
>>>  
>>> 
>>> Registered in the UK at: 107 Timber Wharf, 33 Worsley Street, Manchester, 
>>> M15 4LD.
>>> Company Reg No: 0‌‌5017214 VAT: GB 8‌‌24 5386 19
>>> 
>>> This e-mail is intended solely for the person to whom it is addressed and 
>>> may contain confidential or privileged information.
>>> Any views or opinions presented in this e-mail are solely of the author and 
>>> do not necessarily represent those of Extravision Ltd.
>>> 
>> 
>> __
>> This email has been scanned by the Symantec Email Security.cloud service.
>> For more information please visit http://www.symanteccloud.com
>> __
> 



Re: Trouble boosting a field

2017-01-13 Thread Tom Chiverton
Well, I've tried much larger values than 8, and it still doesn't seem to 
do the job ?


For now, assume my users are searching for exact sub strings of a real 
title.


Tom


On 13/01/17 16:22, Walter Underwood wrote:

I use a boost of 8 for title with no boost on the content. Both Infoseek and 
Inktomi settled on the 8X boost, getting there with completely different 
methodologies.

You might not want the title to completely trump the content. That causes some 
odd anomalies. If someone searches for “ice age 2”, do you really want every 
title with “2” to come before “ice age two”? Or a search for “steve jobs” to 
return every article with “job” or “jobs” in the title first?

Also, use “edismax”, not “dismax”. Dismax was obsolete in Solr 3.x, five years 
ago.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



On Jan 13, 2017, at 7:10 AM, Tom Chiverton  wrote:

I have a few hundred documents with title and content fields.

I want a match in title to trump matches in content. If I search for "connected 
vehicle" then a news article that has that in the content shouldn't be ranked higher 
than the page with that in the title is essentially what I want.

I have tried dismax with qf=title^2 as well as several other variants with the standard query parser 
(like q="title:"foo"^2 OR content:"foo") but documents without the search term 
in the title still come out before those with the term in the title when ordered by score.

Is there something I am missing ?

 From the docs, something like q=title:"connected vehicle"^2 OR content:"connected 
vehicle" should have worked ? Even using ^100 didn't help.

I tried with the dismax parser using

   "q": "Connected Vehicle",
   "defType": "dismax",
   "indent": "true",
   "qf": "title^2000 content",
   "pf": "pf=title^4000 content^2",
   "sort": "score desc",
   "wt": "json",

but that was not better. if I remove content from pf/qf then documents seem to 
rank correctly.
Example query and results (content omitted) : http://pastebin.com/5EhrRJP8 
 with managed-schema http://pastebin.com/mdraWQWE 


--



Tom Chiverton
Lead Developer

e:   t...@extravision.com 

p:  0161 817 2922
t:  @extravision 
w:   www.extravision.com 


 

Registered in the UK at: 107 Timber Wharf, 33 Worsley Street, Manchester, M15 
4LD.
Company Reg No: 0‌‌5017214 VAT: GB 8‌‌24 5386 19

This e-mail is intended solely for the person to whom it is addressed and may 
contain confidential or privileged information.
Any views or opinions presented in this e-mail are solely of the author and do 
not necessarily represent those of Extravision Ltd.



__
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
__




Re: Trouble boosting a field

2017-01-13 Thread Walter Underwood
I use a boost of 8 for title with no boost on the content. Both Infoseek and 
Inktomi settled on the 8X boost, getting there with completely different 
methodologies.

You might not want the title to completely trump the content. That causes some 
odd anomalies. If someone searches for “ice age 2”, do you really want every 
title with “2” to come before “ice age two”? Or a search for “steve jobs” to 
return every article with “job” or “jobs” in the title first?

Also, use “edismax”, not “dismax”. Dismax was obsolete in Solr 3.x, five years 
ago.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jan 13, 2017, at 7:10 AM, Tom Chiverton  wrote:
> 
> I have a few hundred documents with title and content fields. 
> 
> I want a match in title to trump matches in content. If I search for 
> "connected vehicle" then a news article that has that in the content 
> shouldn't be ranked higher than the page with that in the title is 
> essentially what I want.
> 
> I have tried dismax with qf=title^2 as well as several other variants with 
> the standard query parser (like q="title:"foo"^2 OR content:"foo") but 
> documents without the search term in the title still come out before those 
> with the term in the title when ordered by score.
> 
> Is there something I am missing ?
> 
> From the docs, something like q=title:"connected vehicle"^2 OR 
> content:"connected vehicle" should have worked ? Even using ^100 didn't help.
> 
> I tried with the dismax parser using 
> 
>   "q": "Connected Vehicle",
>   "defType": "dismax",
>   "indent": "true",
>   "qf": "title^2000 content",
>   "pf": "pf=title^4000 content^2",
>   "sort": "score desc",
>   "wt": "json",
> 
> but that was not better. if I remove content from pf/qf then documents seem 
> to rank correctly.
> Example query and results (content omitted) : http://pastebin.com/5EhrRJP8 
>  with managed-schema 
> http://pastebin.com/mdraWQWE 
> 
> -- 
> 
> 
> 
> Tom Chiverton
> Lead Developer
> 
> e: t...@extravision.com 
> 
> p:0161 817 2922
> t:@extravision 
> w: www.extravision.com 
> 
> 
>  
> 
> Registered in the UK at: 107 Timber Wharf, 33 Worsley Street, Manchester, M15 
> 4LD.
> Company Reg No: 0‌‌5017214 VAT: GB 8‌‌24 5386 19
> 
> This e-mail is intended solely for the person to whom it is addressed and may 
> contain confidential or privileged information.
> Any views or opinions presented in this e-mail are solely of the author and 
> do not necessarily represent those of Extravision Ltd. 
> 



Re: Trouble boosting a field

2017-01-13 Thread Erick Erickson
Tom:

The output is numbing, but add &debug=true to your query and you'll see
exactly what contributed to the score and why. Otherwise you're flying
blind. Obviously something's trumping your boosting, but you can't pin down
what without the numbers.

You can get an overall sense of what's happening if you return "score" as a
an additional field, but that just gives you the result, now how it was
calculated. However, if you notice your boosting has changed the scores in
the right direction but just not enough it's an indication that bigger
boosts may help.

And do note that boosting _influences_ the score, it'll never guarantee an
absolute ordering where "all titles that have the content will appear
before any doc where the terms appear in the content".

It'll also be easier to read if you output it structured, see:
https://wiki.apache.org/solr/CommonQueryParameters#debug.explain.structured


Best,
Erick

On Fri, Jan 13, 2017 at 7:10 AM, Tom Chiverton  wrote:

> I have a few hundred documents with title and content fields.
>
> I want a match in title to trump matches in content. If I search for
> "connected vehicle" then a news article that has that in the content
> shouldn't be ranked higher than the page with that in the title is
> essentially what I want.
>
> I have tried dismax with qf=title^2 as well as several other variants with
> the standard query parser (like q="title:"foo"^2 OR content:"foo") but
> documents without the search term in the title still come out before those
> with the term in the title when ordered by score.
>
> Is there something I am missing ?
>
> From the docs, something like q=title:"connected vehicle"^2 OR
> content:"connected vehicle" should have worked ? Even using ^100 didn't
> help.
>
> I tried with the dismax parser using
>
>   "q": "Connected Vehicle",
>   "defType": "dismax",
>   "indent": "true",
>   "qf": "title^2000 content",
>   "pf": "pf=title^4000 content^2",
>   "sort": "score desc",
>   "wt": "json",
> but that was not better. if I remove content from pf/qf then documents seem 
> to rank correctly.
>
> Example query and results (content omitted) : http://pastebin.com/5EhrRJP8
> with managed-schema http://pastebin.com/mdraWQWE
>
> --
> *Tom Chiverton*
> Lead Developer
> e:  t...@extravision.com
> p:  0161 817 2922
> t:  @extravision 
> w:  www.extravision.com
> [image: Extravision - email worth seeing] 
> Registered in the UK at: 107 Timber Wharf, 33 Worsley Street, Manchester,
> M15 4LD.
> Company Reg No: 0‌‌5017214 VAT: GB 8‌‌24 5386 19
>
> This e-mail is intended solely for the person to whom it is addressed and
> may contain confidential or privileged information.
> Any views or opinions presented in this e-mail are solely of the author
> and do not necessarily represent those of Extravision Ltd.
>


Trouble boosting a field

2017-01-13 Thread Tom Chiverton

I have a few hundred documents with title and content fields.

I want a match in title to trump matches in content. If I search for 
"connected vehicle" then a news article that has that in the content 
shouldn't be ranked higher than the page with that in the title is 
essentially what I want.


I have tried dismax with qf=title^2 as well as several other variants 
with the standard query parser (like q="title:"foo"^2 OR content:"foo") 
but documents without the search term in the title still come out before 
those with the term in the title when ordered by score.


Is there something I am missing ?

From the docs, something like q=title:"connected vehicle"^2 OR 
content:"connected vehicle" should have worked ? Even using ^100 didn't 
help.


I tried with the dismax parser using

|"q": "Connected Vehicle", "defType": "dismax", "indent": "true", "qf": 
"title^2000 content", "pf": "pf=title^4000 content^2", "sort": "score 
desc", "wt": "json", but that was not better. if I remove content from 
pf/qf then documents seem to rank correctly. |


Example query and results (content omitted) : 
http://pastebin.com/5EhrRJP8 with managed-schema 
http://pastebin.com/mdraWQWE


--
*Tom Chiverton*
Lead Developer
e:  t...@extravision.com 
p:  0161 817 2922
t:  @extravision 
w:  www.extravision.com 

Extravision - email worth seeing 
Registered in the UK at: 107 Timber Wharf, 33 Worsley Street, 
Manchester, M15 4LD.

Company Reg No: 0‌‌5017214 VAT: GB 8‌‌24 5386 19

This e-mail is intended solely for the person to whom it is addressed 
and may contain confidential or privileged information.
Any views or opinions presented in this e-mail are solely of the author 
and do not necessarily represent those of Extravision Ltd.