Re: unified highlighter performance in solr 8.5.1

2020-07-05 Thread David Smiley
Here's my PR, which includes some edits to the ref guide docs where I tried to clarify these settings a little too. https://github.com/apache/lucene-solr/pull/1651 ~ David On Sat, Jul 4, 2020 at 8:44 AM Nándor Mátravölgyi wrote: > I guess that's fair. Let's have hl.fragsizeIsMinimum=true as

Re: unified highlighter performance in solr 8.5.1

2020-07-04 Thread Nándor Mátravölgyi
I guess that's fair. Let's have hl.fragsizeIsMinimum=true as default. On 7/4/20, David Smiley wrote: > I doubt that WORD mode is impacted much by hl.fragsizeIsMinimum in terms of > quality of the highlight since there are vastly more breaks to pick from. > I think that setting is more useful in

Re: unified highlighter performance in solr 8.5.1

2020-07-03 Thread David Smiley
I doubt that WORD mode is impacted much by hl.fragsizeIsMinimum in terms of quality of the highlight since there are vastly more breaks to pick from. I think that setting is more useful in SENTENCE mode if you can stand the perf hit. If you agree, then why not just let this one default to "true"?

Re: unified highlighter performance in solr 8.5.1

2020-07-03 Thread Nándor Mátravölgyi
Since the issue seems to be affecting the highlighter differently based on which mode it is using, having different defaults for the modes could be explored. WORD may have the new defaults as it has little effect on performance and it creates nicer highlights. SENTENCE should have the defaults

Re: unified highlighter performance in solr 8.5.1

2020-07-03 Thread David Smiley
I think we should flip the default of hl.fragsizeIsMinimum to be 'true', thus have the behavior close to what preceded 8.5. (a) it was very recently (<= 8.4) the previous behavior and so may require less tuning for users in 8.6 henceforth (b) it's significantly faster for long text -- seems to be

Re: unified highlighter performance in solr 8.5.1

2020-06-19 Thread Nándor Mátravölgyi
Hi! With the provided test I've profiled the preceding() and following() calls on the base Java iterators in the different options. === default highlighter arguments === Calling the test query with SENTENCE base iterator: - from LengthGoalBreakIterator.following(): 1130 calls of

Re: unified highlighter performance in solr 8.5.1

2020-06-08 Thread Michal Hlavac
Hi David, sorry for my late answer. I created simple test scenarios on github https://github.com/hlavki/solr-unified-highlighter-test[1] There are 2 documents, both bigger sized. Test method:

Re: unified highlighter performance in solr 8.5.1

2020-05-28 Thread Nándor Mátravölgyi
Hi! I've not been able to delve into this issue deeply, but it could be useful to know that "fragsizeIsMinimum" and "fragAlignRatio" are new parameters which have behavior changing default values. Leaving those with their default values makes the comparison between 8.4 and 8.5 like apples to

Re: unified highlighter performance in solr 8.5.1

2020-05-27 Thread David Smiley
try setting hl.fragsizeIsMinimum=true I did some benchmarking and found that this helps quite a bit BTW I used the highlights.alg benchmark file, with some changes to make it more reflective of your scenario -- offsets in postings, and used "enwiki" (english wikipedia) docs which are larger than

Re: unified highlighter performance in solr 8.5.1

2020-05-26 Thread Michal Hlavac
fine, I'l try to write simple test, thanks On utorok 26. mája 2020 17:44:52 CEST David Smiley wrote: > Please create an issue. I haven't reproduced it yet but it seems unlikely > to be user-error. > > ~ David > > > On Mon, May 25, 2020 at 9:28 AM Michal Hlavac wrote: > > > Hi, > > > > I

Re: unified highlighter performance in solr 8.5.1

2020-05-26 Thread David Smiley
Please create an issue. I haven't reproduced it yet but it seems unlikely to be user-error. ~ David On Mon, May 25, 2020 at 9:28 AM Michal Hlavac wrote: > Hi, > > I have field: > stored="true" indexed="false" storeOffsetsWithPositions="true"/> > > and configuration: > true > unified > true

Re: unified highlighter performance in solr 8.5.1

2020-05-25 Thread Michal Hlavac
Yes, have no problems in 8.4.1, only 8.5.1 Also yes, those are multi page pdf files. m. On pondelok 25. mája 2020 19:11:31 CEST David Smiley wrote: > Wow that's terrible! > So this problem is for SENTENCE in particular, and it's a regression in > 8.5? I'll see if I can reproduce this with the

Re: unified highlighter performance in solr 8.5.1

2020-05-25 Thread David Smiley
Wow that's terrible! So this problem is for SENTENCE in particular, and it's a regression in 8.5? I'll see if I can reproduce this with the Lucene benchmark module. I figure you have some meaty text, like "page" size or longer? ~ David On Mon, May 25, 2020 at 10:38 AM Michal Hlavac wrote: >

Re: unified highlighter performance in solr 8.5.1

2020-05-25 Thread Michal Hlavac
I did same test on solr 8.4.1 and response times are same for both hl.bs.type=SENTENCE and hl.bs.type=WORD m. On pondelok 25. mája 2020 15:28:24 CEST Michal Hlavac wrote: Hi, I have field: and configuration: true unified true content_txt_sk_highlight 2 true Doing query with

unified highlighter performance in solr 8.5.1

2020-05-25 Thread Michal Hlavac
Hi, I have field: and configuration: true unified true content_txt_sk_highlight 2 true Doing query with hl.bs.type=SENTENCE it takes around 1000 - 1300 ms which is really slow. Same query with hl.bs.type=WORD takes from 8 - 45 ms is this normal behaviour or should I create issue? thanks, m.