Re: Issue with highlighter

2017-06-18 Thread Erick Erickson
Works perfectly for me. Let's see:

> your solrconfig file, particularly the "select" handler.
> the field definition you use for the content field. Be sure to include the 
> associated fieldType.
> the results of debug=on attached to the query.
> What version of Solr?

Best,
Erick




On Sat, Jun 17, 2017 at 7:14 PM, Ali Husain <alihus...@outlook.com> wrote:
> Damien, I tried that too before I sent the email. Nothing :/
>
>
> http://localhost:8983/solr/testHighlight/select?hl.q=something=*=on=on=something=json
>
>
> This is a bug, right?
>
> 
> From: Damien Kamerman <dami...@gmail.com>
> Sent: Friday, June 16, 2017 12:11:57 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Issue with highlighter
>
> Ali, does adding a 'hl.q' param help?  q=something=something&...
>
> On 16 June 2017 at 06:21, Ali Husain <alihus...@outlook.com> wrote:
>
>> Thanks for the replies. Let me try and explain this a little better.
>>
>>
>> I haven't modified anything in solrconfig. All I did was get a fresh
>> instance of solr 6.4.1 and create a core testHighlight. I then created a
>> content field of type text_en via the Solr Admin UI. id was already there,
>> and that is of type string.
>>
>>
>> I then use the UI, once again to check the hl checkbox, hl.fl is set to *
>> because I want any and every match.
>>
>>
>> I push the following content into this new solr instance:
>>
>> id:91101
>>
>> content:'I am adding something to the core field and we will try and find
>> it. We want to make sure the highlighter works!
>>
>> This is short so fragsize and max characters shouldn\'t be an issue.'
>>
>> As you can see, very few characters, fragsize, maxAnalyzedChars, all that
>> should not be an issue.
>>
>>
>> I then send this query:
>>
>> http://localhost:8983/solr/testHighlight/select?hl.fl=*;
>> hl=on=on=something=json
>>
>>
>> My results:
>>
>>
>> "response":{"numFound":1,"start":0,"docs":[
>>
>> {"id":"91101",
>>
>> "content":"I am adding something to the core field and we will try
>> and find it. We want to make sure the highlighter works! This is short so
>> fragsize and max characters shouldn't be an issue.",
>> "_version_":1570302668841156608}]
>>
>>
>> },
>>
>>
>> "highlighting":{
>> "91101":{}}
>>
>>
>> I change q to be core instead of something.
>>
>>
>> http://localhost:8983/solr/testHighlight/select?hl.fl=*;
>> hl=on=on=core=json
>>
>>
>> {
>> "id":"91101",
>> "content":"I am adding something to the core field and we will try
>> and find it. We want to make sure the highlighter works! This is short so
>> fragsize and max characters shouldn't be an issue.",
>> "_version_":1570302668841156608},
>>
>>
>>
>> "highlighting":{
>> "91101":{
>>   "content":["I am adding something to the core field and we
>> will try and find it. We want to make sure"]}}
>>
>> I've tried a bunch of queries. 'adding', 'something' both don't return any
>> highlights. 'core' 'am' 'field' all work.
>>
>> Am I doing a better job of explaining this? Quite puzzling why this would
>> be happening. My guess is there is some file/config somewhere that is
>> ignoring some words? It isn't stopwords.txt in my case though. If that
>> isn't the case then it definitely seems like a bug to me.
>>
>> Thanks, Ali
>>
>>
>> 
>> From: David Smiley <david.w.smi...@gmail.com>
>> Sent: Thursday, June 15, 2017 12:33:39 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Issue with highlighter
>>
>> > Beware of NOT plus OR in a search. That will certainly produce no
>> highlights. (eg test -results when default op is OR)
>>
>> Seems like a bug to me; the default operator shouldn't matter in that case
>> I think since there is only one clause that has no BooleanQuery.Occur
>> operator and thus the OR/AND shouldn't matter.  The end effect is "test" is
>> effectively required and should definitely be highlighted.
>>
>> Note to Ali: Phil's comment implies use of hl.method=unified which is not
>> the default.
>>
>> On Wed, Jun 14, 2017

Re: Issue with highlighter

2017-06-17 Thread Ali Husain
Damien, I tried that too before I sent the email. Nothing :/


http://localhost:8983/solr/testHighlight/select?hl.q=something=*=on=on=something=json


This is a bug, right?


From: Damien Kamerman <dami...@gmail.com>
Sent: Friday, June 16, 2017 12:11:57 AM
To: solr-user@lucene.apache.org
Subject: Re: Issue with highlighter

Ali, does adding a 'hl.q' param help?  q=something=something&...

On 16 June 2017 at 06:21, Ali Husain <alihus...@outlook.com> wrote:

> Thanks for the replies. Let me try and explain this a little better.
>
>
> I haven't modified anything in solrconfig. All I did was get a fresh
> instance of solr 6.4.1 and create a core testHighlight. I then created a
> content field of type text_en via the Solr Admin UI. id was already there,
> and that is of type string.
>
>
> I then use the UI, once again to check the hl checkbox, hl.fl is set to *
> because I want any and every match.
>
>
> I push the following content into this new solr instance:
>
> id:91101
>
> content:'I am adding something to the core field and we will try and find
> it. We want to make sure the highlighter works!
>
> This is short so fragsize and max characters shouldn\'t be an issue.'
>
> As you can see, very few characters, fragsize, maxAnalyzedChars, all that
> should not be an issue.
>
>
> I then send this query:
>
> http://localhost:8983/solr/testHighlight/select?hl.fl=*;
> hl=on=on=something=json
>
>
> My results:
>
>
> "response":{"numFound":1,"start":0,"docs":[
>
> {"id":"91101",
>
> "content":"I am adding something to the core field and we will try
> and find it. We want to make sure the highlighter works! This is short so
> fragsize and max characters shouldn't be an issue.",
> "_version_":1570302668841156608}]
>
>
> },
>
>
> "highlighting":{
> "91101":{}}
>
>
> I change q to be core instead of something.
>
>
> http://localhost:8983/solr/testHighlight/select?hl.fl=*;
> hl=on=on=core=json
>
>
> {
> "id":"91101",
> "content":"I am adding something to the core field and we will try
> and find it. We want to make sure the highlighter works! This is short so
> fragsize and max characters shouldn't be an issue.",
> "_version_":1570302668841156608},
>
>
>
> "highlighting":{
> "91101":{
>   "content":["I am adding something to the core field and we
> will try and find it. We want to make sure"]}}
>
> I've tried a bunch of queries. 'adding', 'something' both don't return any
> highlights. 'core' 'am' 'field' all work.
>
> Am I doing a better job of explaining this? Quite puzzling why this would
> be happening. My guess is there is some file/config somewhere that is
> ignoring some words? It isn't stopwords.txt in my case though. If that
> isn't the case then it definitely seems like a bug to me.
>
> Thanks, Ali
>
>
> 
> From: David Smiley <david.w.smi...@gmail.com>
> Sent: Thursday, June 15, 2017 12:33:39 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Issue with highlighter
>
> > Beware of NOT plus OR in a search. That will certainly produce no
> highlights. (eg test -results when default op is OR)
>
> Seems like a bug to me; the default operator shouldn't matter in that case
> I think since there is only one clause that has no BooleanQuery.Occur
> operator and thus the OR/AND shouldn't matter.  The end effect is "test" is
> effectively required and should definitely be highlighted.
>
> Note to Ali: Phil's comment implies use of hl.method=unified which is not
> the default.
>
> On Wed, Jun 14, 2017 at 10:22 PM Phil Scadden <p.scad...@gns.cri.nz>
> wrote:
>
> > Just had similar issue - works for some, not others. First thing to look
> > at is hl.maxAnalyzedChars is the query. The default is quite small.
> > Since many of my documents are large PDF files, I opted to use
> > storeOffsetsWithPositions="true" termVectors="true" on the field I was
> > searching on.
> > This certainly did increase my index size but not too bad and certainly
> > fast.
> > https://cwiki.apache.org/confluence/display/solr/Highlighting
> >
> > Beware of NOT plus OR in a search. That will certainly produce no
> > highlights. (eg test -results when default op is OR)
> >
> >
> > -Original Message-
> > From: Ali Husain [mailto:alihus...@outlook.com]
> > Sent: Thursday, 15

Re: Issue with highlighter

2017-06-15 Thread Damien Kamerman
Ali, does adding a 'hl.q' param help?  q=something=something&...

On 16 June 2017 at 06:21, Ali Husain <alihus...@outlook.com> wrote:

> Thanks for the replies. Let me try and explain this a little better.
>
>
> I haven't modified anything in solrconfig. All I did was get a fresh
> instance of solr 6.4.1 and create a core testHighlight. I then created a
> content field of type text_en via the Solr Admin UI. id was already there,
> and that is of type string.
>
>
> I then use the UI, once again to check the hl checkbox, hl.fl is set to *
> because I want any and every match.
>
>
> I push the following content into this new solr instance:
>
> id:91101
>
> content:'I am adding something to the core field and we will try and find
> it. We want to make sure the highlighter works!
>
> This is short so fragsize and max characters shouldn\'t be an issue.'
>
> As you can see, very few characters, fragsize, maxAnalyzedChars, all that
> should not be an issue.
>
>
> I then send this query:
>
> http://localhost:8983/solr/testHighlight/select?hl.fl=*;
> hl=on=on=something=json
>
>
> My results:
>
>
> "response":{"numFound":1,"start":0,"docs":[
>
> {"id":"91101",
>
> "content":"I am adding something to the core field and we will try
> and find it. We want to make sure the highlighter works! This is short so
> fragsize and max characters shouldn't be an issue.",
> "_version_":1570302668841156608}]
>
>
> },
>
>
> "highlighting":{
> "91101":{}}
>
>
> I change q to be core instead of something.
>
>
> http://localhost:8983/solr/testHighlight/select?hl.fl=*;
> hl=on=on=core=json
>
>
> {
> "id":"91101",
> "content":"I am adding something to the core field and we will try
> and find it. We want to make sure the highlighter works! This is short so
> fragsize and max characters shouldn't be an issue.",
> "_version_":1570302668841156608},
>
>
>
> "highlighting":{
> "91101":{
>   "content":["I am adding something to the core field and we
> will try and find it. We want to make sure"]}}
>
> I've tried a bunch of queries. 'adding', 'something' both don't return any
> highlights. 'core' 'am' 'field' all work.
>
> Am I doing a better job of explaining this? Quite puzzling why this would
> be happening. My guess is there is some file/config somewhere that is
> ignoring some words? It isn't stopwords.txt in my case though. If that
> isn't the case then it definitely seems like a bug to me.
>
> Thanks, Ali
>
>
> 
> From: David Smiley <david.w.smi...@gmail.com>
> Sent: Thursday, June 15, 2017 12:33:39 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Issue with highlighter
>
> > Beware of NOT plus OR in a search. That will certainly produce no
> highlights. (eg test -results when default op is OR)
>
> Seems like a bug to me; the default operator shouldn't matter in that case
> I think since there is only one clause that has no BooleanQuery.Occur
> operator and thus the OR/AND shouldn't matter.  The end effect is "test" is
> effectively required and should definitely be highlighted.
>
> Note to Ali: Phil's comment implies use of hl.method=unified which is not
> the default.
>
> On Wed, Jun 14, 2017 at 10:22 PM Phil Scadden <p.scad...@gns.cri.nz>
> wrote:
>
> > Just had similar issue - works for some, not others. First thing to look
> > at is hl.maxAnalyzedChars is the query. The default is quite small.
> > Since many of my documents are large PDF files, I opted to use
> > storeOffsetsWithPositions="true" termVectors="true" on the field I was
> > searching on.
> > This certainly did increase my index size but not too bad and certainly
> > fast.
> > https://cwiki.apache.org/confluence/display/solr/Highlighting
> >
> > Beware of NOT plus OR in a search. That will certainly produce no
> > highlights. (eg test -results when default op is OR)
> >
> >
> > -Original Message-
> > From: Ali Husain [mailto:alihus...@outlook.com]
> > Sent: Thursday, 15 June 2017 11:11 a.m.
> > To: solr-user@lucene.apache.org
> > Subject: Issue with highlighter
> >
> > Hi,
> >
> >
> > I think I've found a bug with the highlighter. I search for the word
> > "something" and I get an empty highlighting response for all the
> documents
> > that are returned sh

Re: Issue with highlighter

2017-06-15 Thread Ali Husain
Thanks for the replies. Let me try and explain this a little better.


I haven't modified anything in solrconfig. All I did was get a fresh instance 
of solr 6.4.1 and create a core testHighlight. I then created a content field 
of type text_en via the Solr Admin UI. id was already there, and that is of 
type string.


I then use the UI, once again to check the hl checkbox, hl.fl is set to * 
because I want any and every match.


I push the following content into this new solr instance:

id:91101

content:'I am adding something to the core field and we will try and find it. 
We want to make sure the highlighter works!

This is short so fragsize and max characters shouldn\'t be an issue.'

As you can see, very few characters, fragsize, maxAnalyzedChars, all that 
should not be an issue.


I then send this query:

http://localhost:8983/solr/testHighlight/select?hl.fl=*=on=on=something=json


My results:


"response":{"numFound":1,"start":0,"docs":[

{"id":"91101",

"content":"I am adding something to the core field and we will try and 
find it. We want to make sure the highlighter works! This is short so fragsize 
and max characters shouldn't be an issue.",
"_version_":1570302668841156608}]


},


"highlighting":{
"91101":{}}


I change q to be core instead of something.


http://localhost:8983/solr/testHighlight/select?hl.fl=*=on=on=core=json


{
"id":"91101",
"content":"I am adding something to the core field and we will try and 
find it. We want to make sure the highlighter works! This is short so fragsize 
and max characters shouldn't be an issue.",
"_version_":1570302668841156608},



"highlighting":{
"91101":{
  "content":["I am adding something to the core field and we will 
try and find it. We want to make sure"]}}

I've tried a bunch of queries. 'adding', 'something' both don't return any 
highlights. 'core' 'am' 'field' all work.

Am I doing a better job of explaining this? Quite puzzling why this would be 
happening. My guess is there is some file/config somewhere that is ignoring 
some words? It isn't stopwords.txt in my case though. If that isn't the case 
then it definitely seems like a bug to me.

Thanks, Ali



From: David Smiley <david.w.smi...@gmail.com>
Sent: Thursday, June 15, 2017 12:33:39 AM
To: solr-user@lucene.apache.org
Subject: Re: Issue with highlighter

> Beware of NOT plus OR in a search. That will certainly produce no
highlights. (eg test -results when default op is OR)

Seems like a bug to me; the default operator shouldn't matter in that case
I think since there is only one clause that has no BooleanQuery.Occur
operator and thus the OR/AND shouldn't matter.  The end effect is "test" is
effectively required and should definitely be highlighted.

Note to Ali: Phil's comment implies use of hl.method=unified which is not
the default.

On Wed, Jun 14, 2017 at 10:22 PM Phil Scadden <p.scad...@gns.cri.nz> wrote:

> Just had similar issue - works for some, not others. First thing to look
> at is hl.maxAnalyzedChars is the query. The default is quite small.
> Since many of my documents are large PDF files, I opted to use
> storeOffsetsWithPositions="true" termVectors="true" on the field I was
> searching on.
> This certainly did increase my index size but not too bad and certainly
> fast.
> https://cwiki.apache.org/confluence/display/solr/Highlighting
>
> Beware of NOT plus OR in a search. That will certainly produce no
> highlights. (eg test -results when default op is OR)
>
>
> -Original Message-
> From: Ali Husain [mailto:alihus...@outlook.com]
> Sent: Thursday, 15 June 2017 11:11 a.m.
> To: solr-user@lucene.apache.org
> Subject: Issue with highlighter
>
> Hi,
>
>
> I think I've found a bug with the highlighter. I search for the word
> "something" and I get an empty highlighting response for all the documents
> that are returned shown below. The fields that I am searching over are
> text_en, the highlighter works for a lot of queries. I have no
> stopwords.txt list that could be messing this up either.
>
>
>  "highlighting":{
> "310":{},
> "103":{},
> "406":{},
> "1189":{},
> "54":{},
> "292":{},
> "309":{}}}
>
>
> Just changing the search term to "something like" I get back this:
>
>
> "highlighting":{
> "310":{},
> "309":{
>   "content":["1949 Convention, like those"]},
> "103":{},
> "406":{},

Re: Issue with highlighter

2017-06-14 Thread David Smiley
> Beware of NOT plus OR in a search. That will certainly produce no
highlights. (eg test -results when default op is OR)

Seems like a bug to me; the default operator shouldn't matter in that case
I think since there is only one clause that has no BooleanQuery.Occur
operator and thus the OR/AND shouldn't matter.  The end effect is "test" is
effectively required and should definitely be highlighted.

Note to Ali: Phil's comment implies use of hl.method=unified which is not
the default.

On Wed, Jun 14, 2017 at 10:22 PM Phil Scadden <p.scad...@gns.cri.nz> wrote:

> Just had similar issue - works for some, not others. First thing to look
> at is hl.maxAnalyzedChars is the query. The default is quite small.
> Since many of my documents are large PDF files, I opted to use
> storeOffsetsWithPositions="true" termVectors="true" on the field I was
> searching on.
> This certainly did increase my index size but not too bad and certainly
> fast.
> https://cwiki.apache.org/confluence/display/solr/Highlighting
>
> Beware of NOT plus OR in a search. That will certainly produce no
> highlights. (eg test -results when default op is OR)
>
>
> -Original Message-
> From: Ali Husain [mailto:alihus...@outlook.com]
> Sent: Thursday, 15 June 2017 11:11 a.m.
> To: solr-user@lucene.apache.org
> Subject: Issue with highlighter
>
> Hi,
>
>
> I think I've found a bug with the highlighter. I search for the word
> "something" and I get an empty highlighting response for all the documents
> that are returned shown below. The fields that I am searching over are
> text_en, the highlighter works for a lot of queries. I have no
> stopwords.txt list that could be messing this up either.
>
>
>  "highlighting":{
> "310":{},
> "103":{},
> "406":{},
> "1189":{},
> "54":{},
> "292":{},
> "309":{}}}
>
>
> Just changing the search term to "something like" I get back this:
>
>
> "highlighting":{
> "310":{},
> "309":{
>   "content":["1949 Convention, like those"]},
> "103":{},
> "406":{},
> "1189":{},
> "54":{},
> "292":{},
> "286":{
>   "content":["persons in these classes are treated like
> combatants, but in other respects"]},
> "336":{
>   "content":["   be treated like engagement"]}}}
>
>
> So I know that I have it setup correctly, but I can't figure this out.
> I've searched through JIRA/Google and haven't been able to find a similar
> issue.
>
>
> Any ideas?
>
>
> Thanks,
>
> Ali
> Notice: This email and any attachments are confidential and may not be
> used, published or redistributed without the prior written consent of the
> Institute of Geological and Nuclear Sciences Limited (GNS Science). If
> received in error please destroy and immediately notify GNS Science. Do not
> copy or disclose the contents.
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com


RE: Issue with highlighter

2017-06-14 Thread Phil Scadden
Just had similar issue - works for some, not others. First thing to look at is 
hl.maxAnalyzedChars is the query. The default is quite small.
Since many of my documents are large PDF files, I opted to use 
storeOffsetsWithPositions="true" termVectors="true" on the field I was 
searching on.
This certainly did increase my index size but not too bad and certainly fast.
https://cwiki.apache.org/confluence/display/solr/Highlighting

Beware of NOT plus OR in a search. That will certainly produce no highlights. 
(eg test -results when default op is OR)


-Original Message-
From: Ali Husain [mailto:alihus...@outlook.com]
Sent: Thursday, 15 June 2017 11:11 a.m.
To: solr-user@lucene.apache.org
Subject: Issue with highlighter

Hi,


I think I've found a bug with the highlighter. I search for the word 
"something" and I get an empty highlighting response for all the documents that 
are returned shown below. The fields that I am searching over are text_en, the 
highlighter works for a lot of queries. I have no stopwords.txt list that could 
be messing this up either.


 "highlighting":{
"310":{},
"103":{},
"406":{},
"1189":{},
"54":{},
"292":{},
"309":{}}}


Just changing the search term to "something like" I get back this:


"highlighting":{
"310":{},
"309":{
  "content":["1949 Convention, like those"]},
"103":{},
"406":{},
"1189":{},
"54":{},
"292":{},
"286":{
  "content":["persons in these classes are treated like 
combatants, but in other respects"]},
"336":{
  "content":["   be treated like engagement"]}}}


So I know that I have it setup correctly, but I can't figure this out. I've 
searched through JIRA/Google and haven't been able to find a similar issue.


Any ideas?


Thanks,

Ali
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


Re: Issue with highlighter

2017-06-14 Thread Erick Erickson
If the default operator is OR, then you're just matching on the "like"
word and it's being properly highlighted. If you're saying that doc
286 (or whatever) has both "something" and "like" in the content and
you expect to find them both, try increasing the number of snippets
returned.

Otherwise we need to see the _complete_ query and response, preferably
with =true. Plus your schema, plus a sample document and exactly
what you think should be happening that isn't.

Best,
Erick

On Wed, Jun 14, 2017 at 4:11 PM, Ali Husain  wrote:
> Hi,
>
>
> I think I've found a bug with the highlighter. I search for the word 
> "something" and I get an empty highlighting response for all the documents 
> that are returned shown below. The fields that I am searching over are 
> text_en, the highlighter works for a lot of queries. I have no stopwords.txt 
> list that could be messing this up either.
>
>
>  "highlighting":{
> "310":{},
> "103":{},
> "406":{},
> "1189":{},
> "54":{},
> "292":{},
> "309":{}}}
>
>
> Just changing the search term to "something like" I get back this:
>
>
> "highlighting":{
> "310":{},
> "309":{
>   "content":["1949 Convention, like those"]},
> "103":{},
> "406":{},
> "1189":{},
> "54":{},
> "292":{},
> "286":{
>   "content":["persons in these classes are treated like 
> combatants, but in other respects"]},
> "336":{
>   "content":["   be treated like engagement"]}}}
>
>
> So I know that I have it setup correctly, but I can't figure this out. I've 
> searched through JIRA/Google and haven't been able to find a similar issue.
>
>
> Any ideas?
>
>
> Thanks,
>
> Ali


Issue with highlighter

2017-06-14 Thread Ali Husain
Hi,


I think I've found a bug with the highlighter. I search for the word 
"something" and I get an empty highlighting response for all the documents that 
are returned shown below. The fields that I am searching over are text_en, the 
highlighter works for a lot of queries. I have no stopwords.txt list that could 
be messing this up either.


 "highlighting":{
"310":{},
"103":{},
"406":{},
"1189":{},
"54":{},
"292":{},
"309":{}}}


Just changing the search term to "something like" I get back this:


"highlighting":{
"310":{},
"309":{
  "content":["1949 Convention, like those"]},
"103":{},
"406":{},
"1189":{},
"54":{},
"292":{},
"286":{
  "content":["persons in these classes are treated like 
combatants, but in other respects"]},
"336":{
  "content":["   be treated like engagement"]}}}


So I know that I have it setup correctly, but I can't figure this out. I've 
searched through JIRA/Google and haven't been able to find a similar issue.


Any ideas?


Thanks,

Ali