Hi David Just reindexed everything and it appears to be performing well and giving me highlights for the matched text.
Thanks for your help. Shaun On Tue, 12 Jan 2021, 21:00 David Smiley, <dsmi...@apache.org> wrote: > The last update to highlighting that I think is pertinent to > whether highlights match or not is v7.6 which added that hl.weightMatches > option. So I recommend upgrading to at least that if you want to > experiment further. But... uh.weightMatches highlights more accurately and > as such is more likely to not highlight as much as you are highlighting > now, and highlighting more is your goal right now it appears. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Tue, Jan 12, 2021 at 2:45 PM Shaun Campbell <campbell.sh...@gmail.com> > wrote: > > > That's great David. So hl.maxAnalyzedChars isn't that critical. I'll > whack > > it right up and see what happens. > > > > I'm running 7.4 from a few years ago. Should I upgrade? > > > > For your info this is what I'm doing with Solr > > https://dev.fundingawards.nihr.ac.uk/search. > > > > Thanks > > Shaun > > > > On Tue, 12 Jan 2021 at 19:33, David Smiley <dsmi...@apache.org> wrote: > > > > > On Tue, Jan 12, 2021 at 1:08 PM Shaun Campbell < > campbell.sh...@gmail.com > > > > > > wrote: > > > > > > > Hi David > > > > > > > > Getting closer now. > > > > > > > > First of all, a bit of a mistake on my part. I have two cores set up > > and > > > I > > > > was changing the solrconfig.xml on the wrong core doh!! That's why > > > > highlighting wasn't being turned off. > > > > > > > > I think I've got the unified highlighter working. > > > > storeOffsetsWithPositions was already configured on my field type > > > > definition, not the field definition, so that was ok. > > > > > > > > What it boils down to now I think is hl.maxAnalyzedChars. I'm getting > > > > highlighting on some records and not others, making it confusing as > to > > > > where the match is with my dismax parser. I increased > > > > my hl.maxAnalyzedChars to 1300000 and now it's highlighting more > > records. > > > > Two questions: > > > > > > > > 1. Have you any guidelines as to what could be a > > > > maximum hl.maxAnalyzedChars without impacting performance or memory? > > > > > > > > > > With storeOffsetsWithPositions, highlighting is super-fast, and so this > > > hl.maxAnalyzedChars threshold is of marginal utility, like only to cap > > the > > > amount of memory used if you have some truly humongous docs and it's > okay > > > only highlight the first X megabytes of them. Maybe set to a 100MB > worth > > > of text, or something like that. > > > > > > > > > > 2. Do you know a way to query the maximum length of text in a field > so > > > that > > > > I can set hl.maxAnalyzedChars accordingly? Just thinking I can > > probably > > > > modify my java indexer to log the maximum content length. Actually, > I > > > > probably don't want the maximum but some value that highlights 90-95% > > > > records > > > > > > > > > > Eh... not really. Maybe some approximation hacks involving function > > > queries on norms but I'd not bother in favor of just using a high > > threshold > > > such that this won't be an issue. > > > > > > All this said, this threshold is *not* the only reason why you might > not > > be > > > getting highlights that you expect. If you are using a recent Solr > > > version, you might try toggling the hl.weightMatches boolean, which > could > > > make a difference for certain query arrangements. There's a JIRA issue > > > pertaining to this one, and I haven't investigated it yet. > > > > > > ~ David > > > > > > > > > > > > > > Thanks > > > > Shaun > > > > > > > > On Tue, 12 Jan 2021 at 16:30, David Smiley <dsmi...@apache.org> > wrote: > > > > > > > > > On Tue, Jan 12, 2021 at 9:39 AM Shaun Campbell < > > > campbell.sh...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > Hi David > > > > > > > > > > > > First of all I wanted to say I'm working off your book!! Third > > > > edition, > > > > > > and I think it's a bit out of date now. I was just going to try > > > > following > > > > > > the section on the Postings highlighter, but I see that's been > > > absorbed > > > > > > into the Unified highlighter. I find your book easier to follow > > than > > > > the > > > > > > official documentation though. > > > > > > > > > > > > > > > > Thanks :-D. I do maintain the Solr Reference Guide for the parts > of > > > > code I > > > > > touch, including highlighting, so I hope what's there makes sense > > too. > > > > > > > > > > > > > > > > I am going to try to configure the unified highlighter, and I > will > > > add > > > > > that > > > > > > storeOffsetsWithPositions to the schema (which I saw in your > book) > > > and > > > > I > > > > > > will try indexing again from scratch. Was getting some funny > > things > > > > > going > > > > > > on where I thought I'd turned highlighting off and it was still > > > giving > > > > me > > > > > > highlights. > > > > > > > > > > > > > > > > hl=true/false > > > > > > > > > > > > > > > > Actually just re-reading your email again, are you saying that > you > > > > can't > > > > > > configure highlighting in solrconfig.xml? That's where I always > > > > configure > > > > > > original highlighting in my dismax search handler. Am I supposed > to > > > add > > > > > > highlighting to each request? > > > > > > > > > > > > > > > > You can set highlighting and other *parameters* in solrconfig.xml > for > > > > > request handlers. But the dedicated <highlighting> plugin info is > > only > > > > for > > > > > the original and Fast Vector Highlighters. > > > > > > > > > > ~ David > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > Shaun > > > > > > > > > > > > On Mon, 11 Jan 2021 at 20:57, David Smiley <dsmi...@apache.org> > > > wrote: > > > > > > > > > > > > > Hello! > > > > > > > > > > > > > > I worked on the UnifiedHighlighter a lot and want to help you! > > > > > > > > > > > > > > On Mon, Jan 11, 2021 at 9:58 AM Shaun Campbell < > > > > > campbell.sh...@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > I've been using highlighting for a while, using the original > > > > > > highlighter, > > > > > > > > and just come across a problem with fields that contain a > large > > > > > amount > > > > > > of > > > > > > > > text, approx 250k characters. I only have about 2,000 records > > but > > > > > each > > > > > > > one > > > > > > > > contains a journal publication to search through. > > > > > > > > > > > > > > > > What I noticed is that some records didn't return a highlight > > > even > > > > > > though > > > > > > > > they matched on the content. I noticed the > hl.maxAnalyzedChars > > > > > > parameter > > > > > > > > and increased that, but it allowed some records to be > > > highlighted, > > > > > but > > > > > > > not > > > > > > > > all, and then it caused memory problems on the server. > > > Performance > > > > > is > > > > > > > also > > > > > > > > very poor. > > > > > > > > > > > > > > > > > > > > > > I've been thinking hl.maxAnalyzedChars should maybe default to > no > > > > limit > > > > > > -- > > > > > > > it's a performance threshold but perhaps better to opt-in to > > such a > > > > > limit > > > > > > > then scratch your head for a long time wondering why a search > > > result > > > > > > isn't > > > > > > > showing highlights. > > > > > > > > > > > > > > > > > > > > > > To try to fix this I've tried to configure the unified > > > highlighter > > > > > in > > > > > > my > > > > > > > > solrconfig.xml instead. It seems to be working but again > I'm > > > > > missing > > > > > > > some > > > > > > > > highlighted records. > > > > > > > > > > > > > > > > > > > > > > There is no configuration of that highlighter in > solrconfig.xml; > > > it's > > > > > > > entirely parameter driven (runtime). > > > > > > > > > > > > > > > > > > > > > > The other thing is I've tried to adjust my unified > highlighting > > > > > > settings > > > > > > > in > > > > > > > > solrconfig.xml and they don't seem to be having any effect > > even > > > > > after > > > > > > > > restarting Solr. I was just wondering whether there is any > > > > > > highlighting > > > > > > > > information stored at index time. It's taking over 4hours to > > > index > > > > my > > > > > > > > records so it's not easy to keep reindexing my content. > > > > > > > > > > > > > > > > Any ideas on how to handle highlighting of large content > would > > > be > > > > > > > > appreciated. > > > > > > > > > > > > > > > > Shaun > > > > > > > > > > > > > > > > > > > > > > Please read the documentation here thoroughly: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://lucene.apache.org/solr/guide/8_6/highlighting.html#the-unified-highlighter > > > > > > > (or earlier version as applicable) > > > > > > > Since you have large bodies of text to highlight, you would > > > strongly > > > > > > > benefit from putting offsets into the search index (and > re-index) > > > -- > > > > > > > storeOffsetsWithPositions. That's an option on the > > field/fieldType > > > > in > > > > > > your > > > > > > > schema; it may not be obvious reading the docs. You have to > > opt-in > > > > to > > > > > > > that; Solr doesn't normally store any info in the index for > > > > > highlighting. > > > > > > > > > > > > > > ~ David Smiley > > > > > > > Apache Lucene/Solr Search Developer > > > > > > > http://www.linkedin.com/in/davidwsmiley > > > > > > > > > > > > > > > > > > > > > > > > > > > >