RE: Highlighter not working on some documents

2017-06-12 Thread Phil Scadden
I managed to miss that. Thanks very much. I have some very large documents. I 
will look at index size and look at posting instead.

-Original Message-
From: David Smiley [mailto:david.w.smi...@gmail.com]
Sent: Monday, 12 June 2017 2:40 p.m.
To: solr-user@lucene.apache.org
Subject: Re: Highlighter not working on some documents

Probably the most common reason is the default hl.maxAnalyzedChars -- thus your 
highlightable text might not be in the first 51200 chars of text.  The first 
Solr release with the unified highlighter had an even lower default of 10k 
chars.

On Fri, Jun 9, 2017 at 9:58 PM Phil Scadden <p.scad...@gns.cri.nz> wrote:

> Tried hard to find difference between pdfs returning no highlighter
> and ones that do for same search term.  Includes pdfs that have been
> OCRed and ones that were text to begin with. Head scratching to me.
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Saturday, 10 June 2017 6:22 a.m.
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Highlighter not working on some documents
>
> Need lots more information. I.e. schema definitions, query you use,
> handler configuration and the like. Note that highlighted fields must
> have stored="true" set and likely the _text_ field doesn't. At least
> in the default schemas stored is set to false for the catch-all field.
> And you don't want to store that information anyway since it's usually
> the destination of copyField directives and you'd highlight _those_ fields.
>
> Best,
> Erick
>
> On Thu, Jun 8, 2017 at 8:37 PM, Phil Scadden <p.scad...@gns.cri.nz> wrote:
> > Do a search with:
> > fl=id,title,datasource=true=unified=50=1=p
> > re
> > ssure+AND+testing=50=0=json
> >
> > and I get back a good list of documents. However, some documents are
> returning empty fields in the highlighter. Eg, in the highlight array have:
> > "W:\\Reports\\OCR\\4272.pdf":{"_text_":[]}
> >
> > Getting this well up the list of results with good highlighted
> > matchers
> above and below this entry. Why would the highlighter be failing?
> >
> > Notice: This email and any attachments are confidential and may not
> > be
> used, published or redistributed without the prior written consent of
> the Institute of Geological and Nuclear Sciences Limited (GNS
> Science). If received in error please destroy and immediately notify
> GNS Science. Do not copy or disclose the contents.
> Notice: This email and any attachments are confidential and may not be
> used, published or redistributed without the prior written consent of
> the Institute of Geological and Nuclear Sciences Limited (GNS
> Science). If received in error please destroy and immediately notify
> GNS Science. Do not copy or disclose the contents.
>
--
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


Re: Highlighter not working on some documents

2017-06-11 Thread David Smiley
Probably the most common reason is the default hl.maxAnalyzedChars -- thus
your highlightable text might not be in the first 51200 chars of text.  The
first Solr release with the unified highlighter had an even lower default
of 10k chars.

On Fri, Jun 9, 2017 at 9:58 PM Phil Scadden <p.scad...@gns.cri.nz> wrote:

> Tried hard to find difference between pdfs returning no highlighter and
> ones that do for same search term.  Includes pdfs that have been OCRed and
> ones that were text to begin with. Head scratching to me.
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Saturday, 10 June 2017 6:22 a.m.
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Highlighter not working on some documents
>
> Need lots more information. I.e. schema definitions, query you use,
> handler configuration and the like. Note that highlighted fields must have
> stored="true" set and likely the _text_ field doesn't. At least in the
> default schemas stored is set to false for the catch-all field.
> And you don't want to store that information anyway since it's usually the
> destination of copyField directives and you'd highlight _those_ fields.
>
> Best,
> Erick
>
> On Thu, Jun 8, 2017 at 8:37 PM, Phil Scadden <p.scad...@gns.cri.nz> wrote:
> > Do a search with:
> > fl=id,title,datasource=true=unified=50=1=pre
> > ssure+AND+testing=50=0=json
> >
> > and I get back a good list of documents. However, some documents are
> returning empty fields in the highlighter. Eg, in the highlight array have:
> > "W:\\Reports\\OCR\\4272.pdf":{"_text_":[]}
> >
> > Getting this well up the list of results with good highlighted matchers
> above and below this entry. Why would the highlighter be failing?
> >
> > Notice: This email and any attachments are confidential and may not be
> used, published or redistributed without the prior written consent of the
> Institute of Geological and Nuclear Sciences Limited (GNS Science). If
> received in error please destroy and immediately notify GNS Science. Do not
> copy or disclose the contents.
> Notice: This email and any attachments are confidential and may not be
> used, published or redistributed without the prior written consent of the
> Institute of Geological and Nuclear Sciences Limited (GNS Science). If
> received in error please destroy and immediately notify GNS Science. Do not
> copy or disclose the contents.
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com


RE: Highlighter not working on some documents

2017-06-09 Thread Phil Scadden
Tried hard to find difference between pdfs returning no highlighter and ones 
that do for same search term.  Includes pdfs that have been OCRed and ones that 
were text to begin with. Head scratching to me.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Saturday, 10 June 2017 6:22 a.m.
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Highlighter not working on some documents

Need lots more information. I.e. schema definitions, query you use, handler 
configuration and the like. Note that highlighted fields must have 
stored="true" set and likely the _text_ field doesn't. At least in the default 
schemas stored is set to false for the catch-all field.
And you don't want to store that information anyway since it's usually the 
destination of copyField directives and you'd highlight _those_ fields.

Best,
Erick

On Thu, Jun 8, 2017 at 8:37 PM, Phil Scadden <p.scad...@gns.cri.nz> wrote:
> Do a search with:
> fl=id,title,datasource=true=unified=50=1=pre
> ssure+AND+testing=50=0=json
>
> and I get back a good list of documents. However, some documents are 
> returning empty fields in the highlighter. Eg, in the highlight array have:
> "W:\\Reports\\OCR\\4272.pdf":{"_text_":[]}
>
> Getting this well up the list of results with good highlighted matchers above 
> and below this entry. Why would the highlighter be failing?
>
> Notice: This email and any attachments are confidential and may not be used, 
> published or redistributed without the prior written consent of the Institute 
> of Geological and Nuclear Sciences Limited (GNS Science). If received in 
> error please destroy and immediately notify GNS Science. Do not copy or 
> disclose the contents.
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


RE: Highlighter not working on some documents

2017-06-09 Thread Phil Scadden
Managed-schema attached (not a default) and the solrconfig.xml. _text_ is 
stored. (not sure how else highlighting could work??).  The indexer puts the 
body text of the pdf into _text_ field. What the value be in putting it into a 
different field and then using copyField??
 Ie
 SolrInputDocument up = new SolrInputDocument();
 String content = textHandler.toString();
 up.addField("_text_",content);

 solr.add(up);

The puzzling thing for me is why are some documents producing highlights and 
others not. The highlighters in the documents that work are pulling body text 
fragments, not things stored in some other field.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Saturday, 10 June 2017 6:22 a.m.
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Highlighter not working on some documents

Need lots more information. I.e. schema definitions, query you use, handler 
configuration and the like. Note that highlighted fields must have 
stored="true" set and likely the _text_ field doesn't. At least in the default 
schemas stored is set to false for the catch-all field.
And you don't want to store that information anyway since it's usually the 
destination of copyField directives and you'd highlight _those_ fields.

Best,
Erick

On Thu, Jun 8, 2017 at 8:37 PM, Phil Scadden <p.scad...@gns.cri.nz> wrote:
> Do a search with:
> fl=id,title,datasource=true=unified=50=1=pre
> ssure+AND+testing=50=0=json
>
> and I get back a good list of documents. However, some documents are 
> returning empty fields in the highlighter. Eg, in the highlight array have:
> "W:\\Reports\\OCR\\4272.pdf":{"_text_":[]}
>
> Getting this well up the list of results with good highlighted matchers above 
> and below this entry. Why would the highlighter be failing?
>
> Notice: This email and any attachments are confidential and may not be used, 
> published or redistributed without the prior written consent of the Institute 
> of Geological and Nuclear Sciences Limited (GNS Science). If received in 
> error please destroy and immediately notify GNS Science. Do not copy or 
> disclose the contents.
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


solrconfig.xml
Description: solrconfig.xml


Re: Highlighter not working on some documents

2017-06-09 Thread Erick Erickson
Need lots more information. I.e. schema definitions, query you use,
handler configuration and the like. Note that highlighted fields must
have stored="true" set and likely the _text_ field doesn't. At least
in the default schemas stored is set to false for the catch-all field.
And you don't want to store that information anyway since it's usually
the destination of copyField directives and you'd highlight _those_
fields.

Best,
Erick

On Thu, Jun 8, 2017 at 8:37 PM, Phil Scadden  wrote:
> Do a search with:
> fl=id,title,datasource=true=unified=50=1=pressure+AND+testing=50=0=json
>
> and I get back a good list of documents. However, some documents are 
> returning empty fields in the highlighter. Eg, in the highlight array have:
> "W:\\Reports\\OCR\\4272.pdf":{"_text_":[]}
>
> Getting this well up the list of results with good highlighted matchers above 
> and below this entry. Why would the highlighter be failing?
>
> Notice: This email and any attachments are confidential and may not be used, 
> published or redistributed without the prior written consent of the Institute 
> of Geological and Nuclear Sciences Limited (GNS Science). If received in 
> error please destroy and immediately notify GNS Science. Do not copy or 
> disclose the contents.