The last update to highlighting that I think is pertinent to
whether highlights match or not is v7.6 which added that hl.weightMatches
option.  So I recommend upgrading to at least that if you want to
experiment further.  But... uh.weightMatches highlights more accurately and
as such is more likely to not highlight as much as you are highlighting
now, and highlighting more is your goal right now it appears.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Jan 12, 2021 at 2:45 PM Shaun Campbell <campbell.sh...@gmail.com>
wrote:

> That's great David.  So hl.maxAnalyzedChars isn't that critical. I'll whack
> it right up and see what happens.
>
> I'm running 7.4 from a few years ago. Should I upgrade?
>
> For your info this is what I'm doing with Solr
> https://dev.fundingawards.nihr.ac.uk/search.
>
> Thanks
> Shaun
>
> On Tue, 12 Jan 2021 at 19:33, David Smiley <dsmi...@apache.org> wrote:
>
> > On Tue, Jan 12, 2021 at 1:08 PM Shaun Campbell <campbell.sh...@gmail.com
> >
> > wrote:
> >
> > > Hi David
> > >
> > > Getting closer now.
> > >
> > > First of all, a bit of a mistake on my part. I have two cores set up
> and
> > I
> > > was changing the solrconfig.xml on the wrong core doh!!  That's why
> > > highlighting wasn't being turned off.
> > >
> > > I think I've got the unified highlighter working.
> > > storeOffsetsWithPositions was already configured on my field type
> > > definition, not the field definition, so that was ok.
> > >
> > > What it boils down to now I think is hl.maxAnalyzedChars. I'm getting
> > > highlighting on some records and not others, making it confusing as to
> > > where the match is with my dismax parser.  I increased
> > > my hl.maxAnalyzedChars to 1300000 and now it's highlighting more
> records.
> > > Two questions:
> > >
> > > 1. Have you any guidelines as to what could be a
> > > maximum hl.maxAnalyzedChars without impacting performance or memory?
> > >
> >
> > With storeOffsetsWithPositions, highlighting is super-fast, and so this
> > hl.maxAnalyzedChars threshold is of marginal utility, like only to cap
> the
> > amount of memory used if you have some truly humongous docs and it's okay
> > only highlight the first X megabytes of them.  Maybe set to a 100MB worth
> > of text, or something like that.
> >
> >
> > > 2. Do you know a way to query the maximum length of text in a field so
> > that
> > > I can set hl.maxAnalyzedChars accordingly?  Just thinking I can
> probably
> > > modify my java indexer to log the maximum content length.  Actually, I
> > > probably don't want the maximum but some value that highlights 90-95%
> > > records
> > >
> >
> > Eh... not really.  Maybe some approximation hacks involving function
> > queries on norms but I'd not bother in favor of just using a high
> threshold
> > such that this won't be an issue.
> >
> > All this said, this threshold is *not* the only reason why you might not
> be
> > getting highlights that you expect.  If you are using a recent Solr
> > version, you might try toggling the hl.weightMatches boolean, which could
> > make a difference for certain query arrangements.  There's a JIRA issue
> > pertaining to this one, and I haven't investigated it yet.
> >
> > ~ David
> >
> >
> > >
> > > Thanks
> > > Shaun
> > >
> > > On Tue, 12 Jan 2021 at 16:30, David Smiley <dsmi...@apache.org> wrote:
> > >
> > > > On Tue, Jan 12, 2021 at 9:39 AM Shaun Campbell <
> > campbell.sh...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi David
> > > > >
> > > > > First of all I wanted to say I'm working off your book!!  Third
> > > edition,
> > > > > and I think it's a bit out of date now. I was just going to try
> > > following
> > > > > the section on the Postings highlighter, but I see that's been
> > absorbed
> > > > > into the Unified highlighter. I find your book easier to follow
> than
> > > the
> > > > > official documentation though.
> > > > >
> > > >
> > > > Thanks :-D.  I do maintain the Solr Reference Guide for the parts of
> > > code I
> > > > touch, including highlighting, so I hope what's there makes sense
> too.
> > > >
> > > >
> > > > > I am going to try to configure the unified highlighter, and I will
> > add
> > > > that
> > > > > storeOffsetsWithPositions to the schema (which I saw in your book)
> > and
> > > I
> > > > > will try indexing again from scratch.  Was getting some funny
> things
> > > > going
> > > > > on where I thought I'd turned highlighting off and it was still
> > giving
> > > me
> > > > > highlights.
> > > > >
> > > >
> > > > hl=true/false
> > > >
> > > >
> > > > > Actually just re-reading your email again, are you saying that you
> > > can't
> > > > > configure highlighting in solrconfig.xml? That's where I always
> > > configure
> > > > > original highlighting in my dismax search handler. Am I supposed to
> > add
> > > > > highlighting to each request?
> > > > >
> > > >
> > > > You can set highlighting and other *parameters* in solrconfig.xml for
> > > > request handlers.  But the dedicated <highlighting> plugin info is
> only
> > > for
> > > > the original and Fast Vector Highlighters.
> > > >
> > > > ~ David
> > > >
> > > >
> > > > >
> > > > > Thanks
> > > > > Shaun
> > > > >
> > > > > On Mon, 11 Jan 2021 at 20:57, David Smiley <dsmi...@apache.org>
> > wrote:
> > > > >
> > > > > > Hello!
> > > > > >
> > > > > > I worked on the UnifiedHighlighter a lot and want to help you!
> > > > > >
> > > > > > On Mon, Jan 11, 2021 at 9:58 AM Shaun Campbell <
> > > > campbell.sh...@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I've been using highlighting for a while, using the original
> > > > > highlighter,
> > > > > > > and just come across a problem with fields that contain a large
> > > > amount
> > > > > of
> > > > > > > text, approx 250k characters. I only have about 2,000 records
> but
> > > > each
> > > > > > one
> > > > > > > contains a journal publication to search through.
> > > > > > >
> > > > > > > What I noticed is that some records didn't return a highlight
> > even
> > > > > though
> > > > > > > they matched on the content. I noticed the hl.maxAnalyzedChars
> > > > > parameter
> > > > > > > and increased that, but  it allowed some records to be
> > highlighted,
> > > > but
> > > > > > not
> > > > > > > all, and then it caused memory problems on the server.
> > Performance
> > > > is
> > > > > > also
> > > > > > > very poor.
> > > > > > >
> > > > > >
> > > > > > I've been thinking hl.maxAnalyzedChars should maybe default to no
> > > limit
> > > > > --
> > > > > > it's a performance threshold but perhaps better to opt-in to
> such a
> > > > limit
> > > > > > then scratch your head for a long time wondering why a search
> > result
> > > > > isn't
> > > > > > showing highlights.
> > > > > >
> > > > > >
> > > > > > > To try to fix this I've tried  to configure the unified
> > highlighter
> > > > in
> > > > > my
> > > > > > > solrconfig.xml instead.   It seems to be working but again I'm
> > > > missing
> > > > > > some
> > > > > > > highlighted records.
> > > > > > >
> > > > > >
> > > > > > There is no configuration of that highlighter in solrconfig.xml;
> > it's
> > > > > > entirely parameter driven (runtime).
> > > > > >
> > > > > >
> > > > > > > The other thing is I've tried to adjust my unified highlighting
> > > > > settings
> > > > > > in
> > > > > > > solrconfig.xml and they don't  seem to be having any effect
> even
> > > > after
> > > > > > > restarting Solr.  I was just wondering whether there is any
> > > > > highlighting
> > > > > > > information stored at index time. It's taking over 4hours to
> > index
> > > my
> > > > > > > records so it's not easy to keep reindexing my content.
> > > > > > >
> > > > > > > Any ideas on how to handle highlighting of large content  would
> > be
> > > > > > > appreciated.
> > > > > > >
> > > > > > > Shaun
> > > > > > >
> > > > > >
> > > > > > Please read the documentation here thoroughly:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lucene.apache.org/solr/guide/8_6/highlighting.html#the-unified-highlighter
> > > > > > (or earlier version as applicable)
> > > > > > Since you have large bodies of text to highlight, you would
> > strongly
> > > > > > benefit from putting offsets into the search index (and re-index)
> > --
> > > > > > storeOffsetsWithPositions.  That's an option on the
> field/fieldType
> > > in
> > > > > your
> > > > > > schema; it may not be obvious reading the docs.  You have to
> opt-in
> > > to
> > > > > > that; Solr doesn't normally store any info in the index for
> > > > highlighting.
> > > > > >
> > > > > > ~ David Smiley
> > > > > > Apache Lucene/Solr Search Developer
> > > > > > http://www.linkedin.com/in/davidwsmiley
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to