Hi David

Just reindexed everything and it appears to be performing well and giving
me highlights for the matched text.

Thanks for your help.
Shaun

On Tue, 12 Jan 2021, 21:00 David Smiley, <dsmi...@apache.org> wrote:

> The last update to highlighting that I think is pertinent to
> whether highlights match or not is v7.6 which added that hl.weightMatches
> option.  So I recommend upgrading to at least that if you want to
> experiment further.  But... uh.weightMatches highlights more accurately and
> as such is more likely to not highlight as much as you are highlighting
> now, and highlighting more is your goal right now it appears.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Tue, Jan 12, 2021 at 2:45 PM Shaun Campbell <campbell.sh...@gmail.com>
> wrote:
>
> > That's great David.  So hl.maxAnalyzedChars isn't that critical. I'll
> whack
> > it right up and see what happens.
> >
> > I'm running 7.4 from a few years ago. Should I upgrade?
> >
> > For your info this is what I'm doing with Solr
> > https://dev.fundingawards.nihr.ac.uk/search.
> >
> > Thanks
> > Shaun
> >
> > On Tue, 12 Jan 2021 at 19:33, David Smiley <dsmi...@apache.org> wrote:
> >
> > > On Tue, Jan 12, 2021 at 1:08 PM Shaun Campbell <
> campbell.sh...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi David
> > > >
> > > > Getting closer now.
> > > >
> > > > First of all, a bit of a mistake on my part. I have two cores set up
> > and
> > > I
> > > > was changing the solrconfig.xml on the wrong core doh!!  That's why
> > > > highlighting wasn't being turned off.
> > > >
> > > > I think I've got the unified highlighter working.
> > > > storeOffsetsWithPositions was already configured on my field type
> > > > definition, not the field definition, so that was ok.
> > > >
> > > > What it boils down to now I think is hl.maxAnalyzedChars. I'm getting
> > > > highlighting on some records and not others, making it confusing as
> to
> > > > where the match is with my dismax parser.  I increased
> > > > my hl.maxAnalyzedChars to 1300000 and now it's highlighting more
> > records.
> > > > Two questions:
> > > >
> > > > 1. Have you any guidelines as to what could be a
> > > > maximum hl.maxAnalyzedChars without impacting performance or memory?
> > > >
> > >
> > > With storeOffsetsWithPositions, highlighting is super-fast, and so this
> > > hl.maxAnalyzedChars threshold is of marginal utility, like only to cap
> > the
> > > amount of memory used if you have some truly humongous docs and it's
> okay
> > > only highlight the first X megabytes of them.  Maybe set to a 100MB
> worth
> > > of text, or something like that.
> > >
> > >
> > > > 2. Do you know a way to query the maximum length of text in a field
> so
> > > that
> > > > I can set hl.maxAnalyzedChars accordingly?  Just thinking I can
> > probably
> > > > modify my java indexer to log the maximum content length.  Actually,
> I
> > > > probably don't want the maximum but some value that highlights 90-95%
> > > > records
> > > >
> > >
> > > Eh... not really.  Maybe some approximation hacks involving function
> > > queries on norms but I'd not bother in favor of just using a high
> > threshold
> > > such that this won't be an issue.
> > >
> > > All this said, this threshold is *not* the only reason why you might
> not
> > be
> > > getting highlights that you expect.  If you are using a recent Solr
> > > version, you might try toggling the hl.weightMatches boolean, which
> could
> > > make a difference for certain query arrangements.  There's a JIRA issue
> > > pertaining to this one, and I haven't investigated it yet.
> > >
> > > ~ David
> > >
> > >
> > > >
> > > > Thanks
> > > > Shaun
> > > >
> > > > On Tue, 12 Jan 2021 at 16:30, David Smiley <dsmi...@apache.org>
> wrote:
> > > >
> > > > > On Tue, Jan 12, 2021 at 9:39 AM Shaun Campbell <
> > > campbell.sh...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi David
> > > > > >
> > > > > > First of all I wanted to say I'm working off your book!!  Third
> > > > edition,
> > > > > > and I think it's a bit out of date now. I was just going to try
> > > > following
> > > > > > the section on the Postings highlighter, but I see that's been
> > > absorbed
> > > > > > into the Unified highlighter. I find your book easier to follow
> > than
> > > > the
> > > > > > official documentation though.
> > > > > >
> > > > >
> > > > > Thanks :-D.  I do maintain the Solr Reference Guide for the parts
> of
> > > > code I
> > > > > touch, including highlighting, so I hope what's there makes sense
> > too.
> > > > >
> > > > >
> > > > > > I am going to try to configure the unified highlighter, and I
> will
> > > add
> > > > > that
> > > > > > storeOffsetsWithPositions to the schema (which I saw in your
> book)
> > > and
> > > > I
> > > > > > will try indexing again from scratch.  Was getting some funny
> > things
> > > > > going
> > > > > > on where I thought I'd turned highlighting off and it was still
> > > giving
> > > > me
> > > > > > highlights.
> > > > > >
> > > > >
> > > > > hl=true/false
> > > > >
> > > > >
> > > > > > Actually just re-reading your email again, are you saying that
> you
> > > > can't
> > > > > > configure highlighting in solrconfig.xml? That's where I always
> > > > configure
> > > > > > original highlighting in my dismax search handler. Am I supposed
> to
> > > add
> > > > > > highlighting to each request?
> > > > > >
> > > > >
> > > > > You can set highlighting and other *parameters* in solrconfig.xml
> for
> > > > > request handlers.  But the dedicated <highlighting> plugin info is
> > only
> > > > for
> > > > > the original and Fast Vector Highlighters.
> > > > >
> > > > > ~ David
> > > > >
> > > > >
> > > > > >
> > > > > > Thanks
> > > > > > Shaun
> > > > > >
> > > > > > On Mon, 11 Jan 2021 at 20:57, David Smiley <dsmi...@apache.org>
> > > wrote:
> > > > > >
> > > > > > > Hello!
> > > > > > >
> > > > > > > I worked on the UnifiedHighlighter a lot and want to help you!
> > > > > > >
> > > > > > > On Mon, Jan 11, 2021 at 9:58 AM Shaun Campbell <
> > > > > campbell.sh...@gmail.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I've been using highlighting for a while, using the original
> > > > > > highlighter,
> > > > > > > > and just come across a problem with fields that contain a
> large
> > > > > amount
> > > > > > of
> > > > > > > > text, approx 250k characters. I only have about 2,000 records
> > but
> > > > > each
> > > > > > > one
> > > > > > > > contains a journal publication to search through.
> > > > > > > >
> > > > > > > > What I noticed is that some records didn't return a highlight
> > > even
> > > > > > though
> > > > > > > > they matched on the content. I noticed the
> hl.maxAnalyzedChars
> > > > > > parameter
> > > > > > > > and increased that, but  it allowed some records to be
> > > highlighted,
> > > > > but
> > > > > > > not
> > > > > > > > all, and then it caused memory problems on the server.
> > > Performance
> > > > > is
> > > > > > > also
> > > > > > > > very poor.
> > > > > > > >
> > > > > > >
> > > > > > > I've been thinking hl.maxAnalyzedChars should maybe default to
> no
> > > > limit
> > > > > > --
> > > > > > > it's a performance threshold but perhaps better to opt-in to
> > such a
> > > > > limit
> > > > > > > then scratch your head for a long time wondering why a search
> > > result
> > > > > > isn't
> > > > > > > showing highlights.
> > > > > > >
> > > > > > >
> > > > > > > > To try to fix this I've tried  to configure the unified
> > > highlighter
> > > > > in
> > > > > > my
> > > > > > > > solrconfig.xml instead.   It seems to be working but again
> I'm
> > > > > missing
> > > > > > > some
> > > > > > > > highlighted records.
> > > > > > > >
> > > > > > >
> > > > > > > There is no configuration of that highlighter in
> solrconfig.xml;
> > > it's
> > > > > > > entirely parameter driven (runtime).
> > > > > > >
> > > > > > >
> > > > > > > > The other thing is I've tried to adjust my unified
> highlighting
> > > > > > settings
> > > > > > > in
> > > > > > > > solrconfig.xml and they don't  seem to be having any effect
> > even
> > > > > after
> > > > > > > > restarting Solr.  I was just wondering whether there is any
> > > > > > highlighting
> > > > > > > > information stored at index time. It's taking over 4hours to
> > > index
> > > > my
> > > > > > > > records so it's not easy to keep reindexing my content.
> > > > > > > >
> > > > > > > > Any ideas on how to handle highlighting of large content
> would
> > > be
> > > > > > > > appreciated.
> > > > > > > >
> > > > > > > > Shaun
> > > > > > > >
> > > > > > >
> > > > > > > Please read the documentation here thoroughly:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lucene.apache.org/solr/guide/8_6/highlighting.html#the-unified-highlighter
> > > > > > > (or earlier version as applicable)
> > > > > > > Since you have large bodies of text to highlight, you would
> > > strongly
> > > > > > > benefit from putting offsets into the search index (and
> re-index)
> > > --
> > > > > > > storeOffsetsWithPositions.  That's an option on the
> > field/fieldType
> > > > in
> > > > > > your
> > > > > > > schema; it may not be obvious reading the docs.  You have to
> > opt-in
> > > > to
> > > > > > > that; Solr doesn't normally store any info in the index for
> > > > > highlighting.
> > > > > > >
> > > > > > > ~ David Smiley
> > > > > > > Apache Lucene/Solr Search Developer
> > > > > > > http://www.linkedin.com/in/davidwsmiley
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to