Hello! I worked on the UnifiedHighlighter a lot and want to help you!
On Mon, Jan 11, 2021 at 9:58 AM Shaun Campbell <campbell.sh...@gmail.com> wrote: > I've been using highlighting for a while, using the original highlighter, > and just come across a problem with fields that contain a large amount of > text, approx 250k characters. I only have about 2,000 records but each one > contains a journal publication to search through. > > What I noticed is that some records didn't return a highlight even though > they matched on the content. I noticed the hl.maxAnalyzedChars parameter > and increased that, but it allowed some records to be highlighted, but not > all, and then it caused memory problems on the server. Performance is also > very poor. > I've been thinking hl.maxAnalyzedChars should maybe default to no limit -- it's a performance threshold but perhaps better to opt-in to such a limit then scratch your head for a long time wondering why a search result isn't showing highlights. > To try to fix this I've tried to configure the unified highlighter in my > solrconfig.xml instead. It seems to be working but again I'm missing some > highlighted records. > There is no configuration of that highlighter in solrconfig.xml; it's entirely parameter driven (runtime). > The other thing is I've tried to adjust my unified highlighting settings in > solrconfig.xml and they don't seem to be having any effect even after > restarting Solr. I was just wondering whether there is any highlighting > information stored at index time. It's taking over 4hours to index my > records so it's not easy to keep reindexing my content. > > Any ideas on how to handle highlighting of large content would be > appreciated. > > Shaun > Please read the documentation here thoroughly: https://lucene.apache.org/solr/guide/8_6/highlighting.html#the-unified-highlighter (or earlier version as applicable) Since you have large bodies of text to highlight, you would strongly benefit from putting offsets into the search index (and re-index) -- storeOffsetsWithPositions. That's an option on the field/fieldType in your schema; it may not be obvious reading the docs. You have to opt-in to that; Solr doesn't normally store any info in the index for highlighting. ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley