Re: hl.preserveMulti in Unified highlighter?

2020-05-23 Thread David Smiley
Better late than never?  I added some new mail filters to bring topics of
interest to my attention.

Any way; this seems like an important use-case.

Anthony:  You'd probably benefit from also setting hl.bs.type=WHOLE since
clearly you want whole values (no snippets/fragments of values).  If I get
around to implementing hl.preserveMulti for the UH, i'll have it make this
assumption likewise.

~ David


On Sat, May 23, 2020 at 1:48 PM Walter Underwood 
wrote:

> I’m a little amused that this thread has become active after almost two
> months of silence.
>
> I think we just used the old highlighter. I don’t even remember now.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On May 23, 2020, at 9:14 AM, Anthony Groves  wrote:
> >
> > Hi Walter,
> >
> > I did something very similar to what David is suggesting when switching
> > from the PostingsHighlighter to the UnifiedHighlighter in Solr 7.
> >
> > In order to include non-highlighted items (exact ordering) when using
> > preserveMulti, we used a custom PassageFormatter that ignored the start
> and
> > end offsets:
> >
> https://github.com/oreillymedia/ifpress-solr-plugin/blob/bf3b07c5be32fbcfa7b6fdfd439d511ef60dab68/src/main/java/com/ifactory/press/db/solr/highlight/HighlightFormatter.java#L35
> >
> > I was actually surprised to see not much of a performance hit from
> > essentially removing the offset usage, but our highlighted fields aren't
> > extremely large :-)
> >
> > Hope that helps!
> > Anthony
> >
> > *Anthony Groves*  | Technical Lead, Search
> >
> > O'Reilly Media, Inc.  | https://www.linkedin.com/in/anthonygroves/
> >
> >
> > On Fri, May 22, 2020 at 4:59 PM David Smiley 
> > wrote:
> >
> >> Hi Walter,
> >>
> >> No, the UnifiedHighlighter does not behave as if this setting were true.
> >>
> >> The docs say:
> >>
> >> `hl.preserveMulti`::
> >> If `true`, multi-valued fields will return all values in the order they
> >> were saved in the index. If `false`, the default, only values that match
> >> the highlight request will be returned.
> >>
> >>
> >> The first sentence there is the essence of it.  Notice it's not
> conditional
> >> on wether there are highlights or not.  The UH won't return values
> lacking
> >> a highlight. Even hl.defaultSummary isn't triggered because *some* of
> the
> >> values have a highlight.
> >>
> >> As I look at the pertinent code right now, I imagine a solution would
> be to
> >> provide a custom PassageFormatter.  If we can assume for this use-case
> that
> >> you can use hl.bs.type=WHOLE as well, then a a simpler PassageFormatter
> >> could basically ignore the passage starts & ends and merely mark up the
> >> original content in entirety, which is a null concatenated sequence of
> all
> >> the values for this field for a document.
> >>
> >> ~ David
> >>
> >>
> >> On Fri, Mar 29, 2019 at 2:02 PM Walter Underwood  >
> >> wrote:
> >>
> >>> We are testing 6.6.1.
> >>>
> >>> wunder
> >>> Walter Underwood
> >>> wun...@wunderwood.org
> >>> http://observer.wunderwood.org/  (my blog)
> >>>
>  On Mar 29, 2019, at 11:02 AM, Walter Underwood  >
> >>> wrote:
> 
>  In testing, hl.preserveMulti=true works with the unified highlighter.
> >>> But the documentation says that the parameter is only implemented in
> the
> >>> original highlighter.
> 
>  Is the documentation wrong? Can we trust this to keep working with
> >>> unified?
> 
>  wunder
>  Walter Underwood
>  wun...@wunderwood.org
>  http://observer.wunderwood.org/  (my blog)
> 
> > On Mar 26, 2019, at 12:08 PM, Walter Underwood <
> wun...@wunderwood.org
> >>>
> >>> wrote:
> >
> > It looks like hl.preserveMulti is only implemented in the Original
> >>> highlighter. Has anyone looked at doing this for the Unified
> highlighter?
> >
> > We need to preserve order in the highlights for a multi-valued field.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org 
> > http://observer.wunderwood.org/  (my blog)
> >
> 
> >>>
> >>>
> >>
>
>


Re: hl.preserveMulti in Unified highlighter?

2020-05-23 Thread Walter Underwood
I’m a little amused that this thread has become active after almost two months 
of silence.

I think we just used the old highlighter. I don’t even remember now.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 23, 2020, at 9:14 AM, Anthony Groves  wrote:
> 
> Hi Walter,
> 
> I did something very similar to what David is suggesting when switching
> from the PostingsHighlighter to the UnifiedHighlighter in Solr 7.
> 
> In order to include non-highlighted items (exact ordering) when using
> preserveMulti, we used a custom PassageFormatter that ignored the start and
> end offsets:
> https://github.com/oreillymedia/ifpress-solr-plugin/blob/bf3b07c5be32fbcfa7b6fdfd439d511ef60dab68/src/main/java/com/ifactory/press/db/solr/highlight/HighlightFormatter.java#L35
> 
> I was actually surprised to see not much of a performance hit from
> essentially removing the offset usage, but our highlighted fields aren't
> extremely large :-)
> 
> Hope that helps!
> Anthony
> 
> *Anthony Groves*  | Technical Lead, Search
> 
> O'Reilly Media, Inc.  | https://www.linkedin.com/in/anthonygroves/
> 
> 
> On Fri, May 22, 2020 at 4:59 PM David Smiley 
> wrote:
> 
>> Hi Walter,
>> 
>> No, the UnifiedHighlighter does not behave as if this setting were true.
>> 
>> The docs say:
>> 
>> `hl.preserveMulti`::
>> If `true`, multi-valued fields will return all values in the order they
>> were saved in the index. If `false`, the default, only values that match
>> the highlight request will be returned.
>> 
>> 
>> The first sentence there is the essence of it.  Notice it's not conditional
>> on wether there are highlights or not.  The UH won't return values lacking
>> a highlight. Even hl.defaultSummary isn't triggered because *some* of the
>> values have a highlight.
>> 
>> As I look at the pertinent code right now, I imagine a solution would be to
>> provide a custom PassageFormatter.  If we can assume for this use-case that
>> you can use hl.bs.type=WHOLE as well, then a a simpler PassageFormatter
>> could basically ignore the passage starts & ends and merely mark up the
>> original content in entirety, which is a null concatenated sequence of all
>> the values for this field for a document.
>> 
>> ~ David
>> 
>> 
>> On Fri, Mar 29, 2019 at 2:02 PM Walter Underwood 
>> wrote:
>> 
>>> We are testing 6.6.1.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
 On Mar 29, 2019, at 11:02 AM, Walter Underwood 
>>> wrote:
 
 In testing, hl.preserveMulti=true works with the unified highlighter.
>>> But the documentation says that the parameter is only implemented in the
>>> original highlighter.
 
 Is the documentation wrong? Can we trust this to keep working with
>>> unified?
 
 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)
 
> On Mar 26, 2019, at 12:08 PM, Walter Underwood >> 
>>> wrote:
> 
> It looks like hl.preserveMulti is only implemented in the Original
>>> highlighter. Has anyone looked at doing this for the Unified highlighter?
> 
> We need to preserve order in the highlights for a multi-valued field.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org 
> http://observer.wunderwood.org/  (my blog)
> 
 
>>> 
>>> 
>> 



Re: hl.preserveMulti in Unified highlighter?

2020-05-23 Thread Anthony Groves
Hi Walter,

I did something very similar to what David is suggesting when switching
from the PostingsHighlighter to the UnifiedHighlighter in Solr 7.

In order to include non-highlighted items (exact ordering) when using
preserveMulti, we used a custom PassageFormatter that ignored the start and
end offsets:
https://github.com/oreillymedia/ifpress-solr-plugin/blob/bf3b07c5be32fbcfa7b6fdfd439d511ef60dab68/src/main/java/com/ifactory/press/db/solr/highlight/HighlightFormatter.java#L35

I was actually surprised to see not much of a performance hit from
essentially removing the offset usage, but our highlighted fields aren't
extremely large :-)

Hope that helps!
Anthony

*Anthony Groves*  | Technical Lead, Search

O'Reilly Media, Inc.  | https://www.linkedin.com/in/anthonygroves/


On Fri, May 22, 2020 at 4:59 PM David Smiley 
wrote:

> Hi Walter,
>
> No, the UnifiedHighlighter does not behave as if this setting were true.
>
> The docs say:
>
> `hl.preserveMulti`::
> If `true`, multi-valued fields will return all values in the order they
> were saved in the index. If `false`, the default, only values that match
> the highlight request will be returned.
>
>
> The first sentence there is the essence of it.  Notice it's not conditional
> on wether there are highlights or not.  The UH won't return values lacking
> a highlight. Even hl.defaultSummary isn't triggered because *some* of the
> values have a highlight.
>
> As I look at the pertinent code right now, I imagine a solution would be to
> provide a custom PassageFormatter.  If we can assume for this use-case that
> you can use hl.bs.type=WHOLE as well, then a a simpler PassageFormatter
> could basically ignore the passage starts & ends and merely mark up the
> original content in entirety, which is a null concatenated sequence of all
> the values for this field for a document.
>
> ~ David
>
>
> On Fri, Mar 29, 2019 at 2:02 PM Walter Underwood 
> wrote:
>
> > We are testing 6.6.1.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> > > On Mar 29, 2019, at 11:02 AM, Walter Underwood 
> > wrote:
> > >
> > > In testing, hl.preserveMulti=true works with the unified highlighter.
> > But the documentation says that the parameter is only implemented in the
> > original highlighter.
> > >
> > > Is the documentation wrong? Can we trust this to keep working with
> > unified?
> > >
> > > wunder
> > > Walter Underwood
> > > wun...@wunderwood.org
> > > http://observer.wunderwood.org/  (my blog)
> > >
> > >> On Mar 26, 2019, at 12:08 PM, Walter Underwood  >
> > wrote:
> > >>
> > >> It looks like hl.preserveMulti is only implemented in the Original
> > highlighter. Has anyone looked at doing this for the Unified highlighter?
> > >>
> > >> We need to preserve order in the highlights for a multi-valued field.
> > >>
> > >> wunder
> > >> Walter Underwood
> > >> wun...@wunderwood.org 
> > >> http://observer.wunderwood.org/  (my blog)
> > >>
> > >
> >
> >
>


Re: hl.preserveMulti in Unified highlighter?

2020-05-22 Thread David Smiley
Hi Walter,

No, the UnifiedHighlighter does not behave as if this setting were true.

The docs say:

`hl.preserveMulti`::
If `true`, multi-valued fields will return all values in the order they
were saved in the index. If `false`, the default, only values that match
the highlight request will be returned.


The first sentence there is the essence of it.  Notice it's not conditional
on wether there are highlights or not.  The UH won't return values lacking
a highlight. Even hl.defaultSummary isn't triggered because *some* of the
values have a highlight.

As I look at the pertinent code right now, I imagine a solution would be to
provide a custom PassageFormatter.  If we can assume for this use-case that
you can use hl.bs.type=WHOLE as well, then a a simpler PassageFormatter
could basically ignore the passage starts & ends and merely mark up the
original content in entirety, which is a null concatenated sequence of all
the values for this field for a document.

~ David


On Fri, Mar 29, 2019 at 2:02 PM Walter Underwood 
wrote:

> We are testing 6.6.1.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Mar 29, 2019, at 11:02 AM, Walter Underwood 
> wrote:
> >
> > In testing, hl.preserveMulti=true works with the unified highlighter.
> But the documentation says that the parameter is only implemented in the
> original highlighter.
> >
> > Is the documentation wrong? Can we trust this to keep working with
> unified?
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >> On Mar 26, 2019, at 12:08 PM, Walter Underwood 
> wrote:
> >>
> >> It looks like hl.preserveMulti is only implemented in the Original
> highlighter. Has anyone looked at doing this for the Unified highlighter?
> >>
> >> We need to preserve order in the highlights for a multi-valued field.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org 
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >
>
>


Re: hl.preserveMulti in Unified highlighter?

2019-03-29 Thread Walter Underwood
We are testing 6.6.1.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 29, 2019, at 11:02 AM, Walter Underwood  wrote:
> 
> In testing, hl.preserveMulti=true works with the unified highlighter. But the 
> documentation says that the parameter is only implemented in the original 
> highlighter.
> 
> Is the documentation wrong? Can we trust this to keep working with unified?
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Mar 26, 2019, at 12:08 PM, Walter Underwood  wrote:
>> 
>> It looks like hl.preserveMulti is only implemented in the Original 
>> highlighter. Has anyone looked at doing this for the Unified highlighter?
>> 
>> We need to preserve order in the highlights for a multi-valued field.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org 
>> http://observer.wunderwood.org/  (my blog)
>> 
> 



Re: hl.preserveMulti in Unified highlighter?

2019-03-29 Thread Walter Underwood
In testing, hl.preserveMulti=true works with the unified highlighter. But the 
documentation says that the parameter is only implemented in the original 
highlighter.

Is the documentation wrong? Can we trust this to keep working with unified?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 26, 2019, at 12:08 PM, Walter Underwood  wrote:
> 
> It looks like hl.preserveMulti is only implemented in the Original 
> highlighter. Has anyone looked at doing this for the Unified highlighter?
> 
> We need to preserve order in the highlights for a multi-valued field.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org 
> http://observer.wunderwood.org/  (my blog)
>