On Tue, Jan 12, 2021 at 1:08 PM Shaun Campbell <campbell.sh...@gmail.com>
wrote:

> Hi David
>
> Getting closer now.
>
> First of all, a bit of a mistake on my part. I have two cores set up and I
> was changing the solrconfig.xml on the wrong core doh!!  That's why
> highlighting wasn't being turned off.
>
> I think I've got the unified highlighter working.
> storeOffsetsWithPositions was already configured on my field type
> definition, not the field definition, so that was ok.
>
> What it boils down to now I think is hl.maxAnalyzedChars. I'm getting
> highlighting on some records and not others, making it confusing as to
> where the match is with my dismax parser.  I increased
> my hl.maxAnalyzedChars to 1300000 and now it's highlighting more records.
> Two questions:
>
> 1. Have you any guidelines as to what could be a
> maximum hl.maxAnalyzedChars without impacting performance or memory?
>

With storeOffsetsWithPositions, highlighting is super-fast, and so this
hl.maxAnalyzedChars threshold is of marginal utility, like only to cap the
amount of memory used if you have some truly humongous docs and it's okay
only highlight the first X megabytes of them.  Maybe set to a 100MB worth
of text, or something like that.


> 2. Do you know a way to query the maximum length of text in a field so that
> I can set hl.maxAnalyzedChars accordingly?  Just thinking I can probably
> modify my java indexer to log the maximum content length.  Actually, I
> probably don't want the maximum but some value that highlights 90-95%
> records
>

Eh... not really.  Maybe some approximation hacks involving function
queries on norms but I'd not bother in favor of just using a high threshold
such that this won't be an issue.

All this said, this threshold is *not* the only reason why you might not be
getting highlights that you expect.  If you are using a recent Solr
version, you might try toggling the hl.weightMatches boolean, which could
make a difference for certain query arrangements.  There's a JIRA issue
pertaining to this one, and I haven't investigated it yet.

~ David


>
> Thanks
> Shaun
>
> On Tue, 12 Jan 2021 at 16:30, David Smiley <dsmi...@apache.org> wrote:
>
> > On Tue, Jan 12, 2021 at 9:39 AM Shaun Campbell <campbell.sh...@gmail.com
> >
> > wrote:
> >
> > > Hi David
> > >
> > > First of all I wanted to say I'm working off your book!!  Third
> edition,
> > > and I think it's a bit out of date now. I was just going to try
> following
> > > the section on the Postings highlighter, but I see that's been absorbed
> > > into the Unified highlighter. I find your book easier to follow than
> the
> > > official documentation though.
> > >
> >
> > Thanks :-D.  I do maintain the Solr Reference Guide for the parts of
> code I
> > touch, including highlighting, so I hope what's there makes sense too.
> >
> >
> > > I am going to try to configure the unified highlighter, and I will add
> > that
> > > storeOffsetsWithPositions to the schema (which I saw in your book) and
> I
> > > will try indexing again from scratch.  Was getting some funny things
> > going
> > > on where I thought I'd turned highlighting off and it was still giving
> me
> > > highlights.
> > >
> >
> > hl=true/false
> >
> >
> > > Actually just re-reading your email again, are you saying that you
> can't
> > > configure highlighting in solrconfig.xml? That's where I always
> configure
> > > original highlighting in my dismax search handler. Am I supposed to add
> > > highlighting to each request?
> > >
> >
> > You can set highlighting and other *parameters* in solrconfig.xml for
> > request handlers.  But the dedicated <highlighting> plugin info is only
> for
> > the original and Fast Vector Highlighters.
> >
> > ~ David
> >
> >
> > >
> > > Thanks
> > > Shaun
> > >
> > > On Mon, 11 Jan 2021 at 20:57, David Smiley <dsmi...@apache.org> wrote:
> > >
> > > > Hello!
> > > >
> > > > I worked on the UnifiedHighlighter a lot and want to help you!
> > > >
> > > > On Mon, Jan 11, 2021 at 9:58 AM Shaun Campbell <
> > campbell.sh...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > I've been using highlighting for a while, using the original
> > > highlighter,
> > > > > and just come across a problem with fields that contain a large
> > amount
> > > of
> > > > > text, approx 250k characters. I only have about 2,000 records but
> > each
> > > > one
> > > > > contains a journal publication to search through.
> > > > >
> > > > > What I noticed is that some records didn't return a highlight even
> > > though
> > > > > they matched on the content. I noticed the hl.maxAnalyzedChars
> > > parameter
> > > > > and increased that, but  it allowed some records to be highlighted,
> > but
> > > > not
> > > > > all, and then it caused memory problems on the server.  Performance
> > is
> > > > also
> > > > > very poor.
> > > > >
> > > >
> > > > I've been thinking hl.maxAnalyzedChars should maybe default to no
> limit
> > > --
> > > > it's a performance threshold but perhaps better to opt-in to such a
> > limit
> > > > then scratch your head for a long time wondering why a search result
> > > isn't
> > > > showing highlights.
> > > >
> > > >
> > > > > To try to fix this I've tried  to configure the unified highlighter
> > in
> > > my
> > > > > solrconfig.xml instead.   It seems to be working but again I'm
> > missing
> > > > some
> > > > > highlighted records.
> > > > >
> > > >
> > > > There is no configuration of that highlighter in solrconfig.xml; it's
> > > > entirely parameter driven (runtime).
> > > >
> > > >
> > > > > The other thing is I've tried to adjust my unified highlighting
> > > settings
> > > > in
> > > > > solrconfig.xml and they don't  seem to be having any effect even
> > after
> > > > > restarting Solr.  I was just wondering whether there is any
> > > highlighting
> > > > > information stored at index time. It's taking over 4hours to index
> my
> > > > > records so it's not easy to keep reindexing my content.
> > > > >
> > > > > Any ideas on how to handle highlighting of large content  would be
> > > > > appreciated.
> > > > >
> > > > > Shaun
> > > > >
> > > >
> > > > Please read the documentation here thoroughly:
> > > >
> > > >
> > >
> >
> https://lucene.apache.org/solr/guide/8_6/highlighting.html#the-unified-highlighter
> > > > (or earlier version as applicable)
> > > > Since you have large bodies of text to highlight, you would strongly
> > > > benefit from putting offsets into the search index (and re-index) --
> > > > storeOffsetsWithPositions.  That's an option on the field/fieldType
> in
> > > your
> > > > schema; it may not be obvious reading the docs.  You have to opt-in
> to
> > > > that; Solr doesn't normally store any info in the index for
> > highlighting.
> > > >
> > > > ~ David Smiley
> > > > Apache Lucene/Solr Search Developer
> > > > http://www.linkedin.com/in/davidwsmiley
> > > >
> > >
> >
>

Reply via email to