Arturas:

Thanks for the "atta boy's", but I have to confess I poked a
developer's list and the person (David Smiley) who, you know, like
understands the highlighting code replied, and I passed it on ;

I have great respect for the SO forum, but don't post to it since
there's only so much time in a day, so please feel free to put that
explanation over there.

As for the rest, I'll have to pass today, the aforementioned time
constraints are calling....

Best,
Erick

On Mon, Mar 26, 2018 at 12:12 AM, Arturas Mazeika <maze...@gmail.com> wrote:
> Hi Erick,
>
> Adding a field-qualify to the hl.q parameter solved the issue. My
> excitement is steaming over the roof! What a thorough answer: the
> explanation about the behavior of solr, how it tries to interpret what I
> mean when I supply a keyword without the field-qualifier. Very impressive.
> Would you care (re)posting this answer to stackoverflow? If that is too
> much of a hassle, I'll do this in a couple of days myself on your behalf.
>
> I am impressed how well, thorough, fast and fully the question was answered.
>
> Steven hint pushed me into this direction further: he suggested to use the
> query part of solr to filter and sort out the relevant answers in the 1st
> step and in the 2nd step he'd highlight all the keywords using CTR+F (in
> the browser or some alternative viewer). This brought be to the next
> question:
>
> How can one match query terms with the analyze-chained documents in an
> efficient and distributed manner? My current understanding how to achieve
> this is the following:
>
> 1. Get the list of ids (contents) of the documents that match the query
> 2. Use the http://localhost:8983/solr/#/trans/analysis to re-analyze the
> document and the query
> 3. Use the matching of the substrings from the original text to last
> filter/tokenizer/analyzer in the analyze-chain to map the terms of the query
> 4. Emulate CTRL+F highlighting
>
> Web Interface of Solr offers quite a bit to advance towards this goal. If
> one fires this request:
>
> * analysis.fieldvalue=Albert Einstein (14 March 1879 – 18 April 1955) was a
> German-born theoretical physicist[5] who developed the theory of
> relativity, one of the two pillars of modern physics (alongside quantum
> mechanics).&
> * analysis.query=reletivity theory
>
> to one of the cores of solr, one gets the steps 1-3 done:
>
> http://localhost:8983/solr/trans_shard1_replica_n1/analysis/field?wt=xml&analysis.showmatch=true&analysis.fieldvalue=Albert%20Einstein%20(14%20March%201879%20%E2%80%93%2018%20April%201955)%20was%20a%20German-born%20theoretical%20physicist[5]%20who%20developed%20the%20theory%20of%20relativity,%20one%20of%20the%20two%20pillars%20of%20modern%20physics%20(alongside%20quantum%20mechanics).&analysis.query=reletivity%20theory&analysis.fieldtype=text_en
>
> Questions:
>
> 1. Is there a way to "load-balance" this? In the above url, I need to
> specify a specific core. Is it possible to generalize it, so the core that
> receives the request is not necessarily the one that processes it? Or this
> already is distributed in a sense that receiving core and processing cores
> are never the same?
>
> 2. The document was already analyze-chained. Is is possible to store this
> information so one does not need to re-analyze-chain it once more?
>
> Cheers
> Arturas
>
> On Fri, Mar 23, 2018 at 9:15 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> Arturas:
>>
>> Try to field-qualify your hl.q parameter. That looks like:
>>
>> hl.q=trans:Kundigung
>> or
>> hl.q=trans:Kündigung
>>
>> I saw the exact behavior you describe when I did _not_ specify the
>> field in the hl.q parameter, i.e.
>>
>> hl.q=Kundigung
>> or
>> hl.q=Kündigung
>>
>> didn't show all highlights.
>>
>> But when I did specify the field, it worked.
>>
>> Here's what I think is happening: Solr uses the default search
>> field when parsing an un-field-qualified query. I.e.
>>
>> q=something
>>
>> is parsed as
>>
>> q=default_search_field:something.
>>
>> The default field is controlled in solrconfig.xml with the "df"
>> parameter, you'll see entries like:
>> <str name="df">my_field</str>
>>
>> Also when I changed the "df" parameter to the field I was highlighting
>> on, I didn't need to specify the field on the hl.q parameter.
>>
>> hl.q=Kundigung
>> or
>> hl.q=Kündigung
>>
>> The default  field is usually "text", which knows nothing about
>> the German-specific filters you've applied unless you changed it.
>>
>> So in the absence of a field-qualification for the hl.q parameter Solr
>> was parsing the query according to the analysis chain specifed
>> in your default field, and probably passed ü through without
>> transforming it. Since your indexing analysis chain for that field
>> folded ü to just plain u, it wasn't found or highlighted.
>>
>> On the surface, this does seem like something that should be
>> changed, I'll go ahead and ping the dev list.
>>
>> NOTE: I was trying this on Solr 7.1
>>
>> Best,
>> Erick
>>
>> On Fri, Mar 23, 2018 at 12:03 PM, Arturas Mazeika <maze...@gmail.com>
>> wrote:
>> > Hi Erick,
>> >
>> > Thanks for the update and the infos. Your post brought quite a bit of
>> light
>> > into the picture and now I understand quite a bit more about what you are
>> > saying. Your explanation makes sense and can be quite useful in certain
>> > scenarious.
>> >
>> > What stroke me from your description is that you are saying that the
>> > analyzer-chain needs to be applied for the highlighting queries as well.
>> > The tragedy is that I am not able to get this for a german collection: if
>> > the query is set (no explicit highlighting query), the highlighting is
>> > correct. It is also correct, if I replace the umaults into the
>> > corresponding latin chars. Getting the analyzer chain for the
>> highlighting
>> > terms remains the challenge.
>> >
>> > Do you think you have a look at the following stakoverflow link? Maybe
>> > something comes to your mind...
>> >
>> > *https://stackoverflow.com/questions/49276093/solr-
>> highlighting-terms-with-umlaut-not-found-not-highlighted
>> > <https://stackoverflow.com/questions/49276093/solr-
>> highlighting-terms-with-umlaut-not-found-not-highlighted>*
>> >
>> > *Cheers,*
>> >
>> > *Arturas*
>> > On Fri, Mar 23, 2018, 17:43 Erick Erickson <erickerick...@gmail.com>
>> wrote:
>> >
>> >> bq: this is not a typical case that one searches for a keyword but
>> >> highlights something else
>> >>
>> >> This isn't really an unusual case, apparently I mislead you.
>> >>
>> >> What I was trying to convey is that the analysis chain used is firmly
>> >> attached to a particular _field_. There's no way to say "use one
>> >> analysis chain for the query and another for highlighting on the
>> >> _same_ field".
>> >>
>> >> You can use two different fields with different analysis chains, one
>> >> for each purpose. So something like
>> >>
>> >> q=f1:something&hl.fl=f2,f3&hl.q=other
>> >>
>> >> is certainly reasonable. It'll search for "something" in f1, and
>> >> highlight "other" in f2 and f3
>> >>
>> >> Each fields processes its input with the analysis chain defined in the
>> >> schema.
>> >>
>> >> The rest about stored="true" can be ignored, it's just me wandering
>> >> off into the weeds about an optimization that only stores the data
>> >> once rather than redundantly in multiple fields.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Fri, Mar 23, 2018 at 4:37 AM, Arturas Mazeika <maze...@gmail.com>
>> >> wrote:
>> >> > Hi Mathesis (Stefan),
>> >> >
>> >> > Thanks for the questions. This made me look at the problem from a
>> >> distance
>> >> > and re-frame the situation. Good questions indeed.
>> >> >
>> >> > Trying to go around: consider a user who describes herself as being a
>> BMW
>> >> > fan, being convinced that all BMW need to be the blackest color
>> possible
>> >> > (for a sake of argument) who would like to search and later browse the
>> >> > entries in the discussion forum (of course not everything but BMW of
>> the
>> >> > blackest color), and what interest her are the snippets that have
>> >> > understood, craziest as keywords or the like (because she is looking
>> for
>> >> a
>> >> > dozen of discussions that she saw before).
>> >> >
>> >> > What I was not able to achieve so far is: (i) combine query term for
>> >> > filtering and highlighting, (ii) using the analyzer-chain from the
>> >> > attribute to rewrite the highlight query (or define one in the search)
>> >> >
>> >> > CTR+F technique is a very powerful one, indeed. Works most of the
>> time.
>> >> The
>> >> > difficulties with it are query rewriting, enriching, etc.
>> >> >
>> >> > Cheers,
>> >> > Arturas
>> >> >
>> >> > On Fri, Mar 23, 2018 at 11:29 AM, Stefan Matheis <
>> >> matheis.ste...@gmail.com>
>> >> > wrote:
>> >> >
>> >> >> Perhaps we try it the other way round .. what's your use case for
>> this?
>> >> I'm
>> >> >> trying to think of a situation where I'd need this a as user?
>> >> >>
>> >> >> The only reason I see myself doing this is CTRL+F in a page when the
>> >> search
>> >> >> result is not  immediately visible for me ;)
>> >> >>
>> >> >> On Mar 23, 2018 9:41 AM, "Arturas Mazeika" <maze...@gmail.com>
>> wrote:
>> >> >>
>> >> >> > Hi Erick et al,
>> >> >> >
>> >> >> > From your answer I understand that this is not a typical case that
>> one
>> >> >> > searches for a keyword but highlights something else. Since we have
>> >> two
>> >> >> > parameters (q vs hl.q) I thought they are freely combinable. From
>> your
>> >> >> > answer I understand that this is not really the case. My current
>> >> >> > understanding came from [1] that says:
>> >> >> >
>> >> >> > hl.q
>> >> >> >
>> >> >> > A query to use for highlighting. This parameter allows you to
>> >> highlight
>> >> >> > different terms than those being used to retrieve documents.
>> >> >> > what I hear from you is something different: i.e., that this is not
>> >> >> enough
>> >> >> > just to combine the q with hl.q, that there are caveats to achieve
>> the
>> >> >> task
>> >> >> > (multiple fields, FastVectorHighlighter).
>> >> >> >
>> >> >> > Your infos are very helpful.
>> >> >> >
>> >> >> > Cheers,
>> >> >> > Arturas
>> >> >> >
>> >> >> > [1]  https://lucene.apache.org/solr/guide/7_2/highlighting.html
>> >> >> >
>> >> >> > On Thu, Mar 22, 2018 at 4:07 PM, Erick Erickson <
>> >> erickerick...@gmail.com
>> >> >> >
>> >> >> > wrote:
>> >> >> >
>> >> >> > > Basically you need to use a copyField, but in several variants:
>> >> >> > >
>> >> >> > > If you use the field _exclusively_ for highlighting then store
>> the
>> >> raw
>> >> >> > > content there and have the field use whatever analyzer you want.
>> You
>> >> >> > > do _not_ need to have indexed="true" set for the field if you're
>> >> >> > > highlighting on the fly. So you're searching against field1
>> (which
>> >> has
>> >> >> > > indexed="true" stored="false" set) but highlighting against
>> field2
>> >> >> > > (which has indexed="false" stored="true" set). Of course any time
>> >> you
>> >> >> > > want to return the contents in a doc your fl needs to specify
>> >> >> > > field2...
>> >> >> > >
>> >> >> > > The above does not bloat your index at all since the cost of
>> >> >> > > stored="true" indexed="true" is the same as if you use two
>> fields,
>> >> >> > > each with only one option turned on.
>> >> >> > >
>> >> >> > > The second approach if you want to use FastVectorHighlighter or
>> the
>> >> >> > > like is simply to index both fields.
>> >> >> > >
>> >> >> > > Best,
>> >> >> > > Erick
>> >> >> > >
>> >> >> > > On Thu, Mar 22, 2018 at 2:18 AM, Arturas Mazeika <
>> maze...@gmail.com
>> >> >
>> >> >> > > wrote:
>> >> >> > > > Hi Solr-Users,
>> >> >> > > >
>> >> >> > > > I've been playing with a german collection of documents, where
>> I
>> >> >> tried
>> >> >> > to
>> >> >> > > > search for one word (q=Tag) and highlighted another:
>> >> >> (hl.q=Kundigung).
>> >> >> > Is
>> >> >> > > > this a "legal" use case? My key question is how can I tell solr
>> >> which
>> >> >> > > query
>> >> >> > > > analyzer to use for highlighting? Strictly speaking, I should
>> use
>> >> >> > > > hl.q=Kündigung to conceptually look for relevant information,
>> but
>> >> in
>> >> >> > this
>> >> >> > > > case, no highlighting is returned (as all umlauts are left out
>> in
>> >> the
>> >> >> > > > index) .
>> >> >> > > >
>> >> >> > > > Additional infos:
>> >> >> > > >
>> >> >> > > > solr version: 7.2
>> >> >> > > > urls to query:
>> >> >> > > >
>> >> >> > > > http://localhost:8983/solr/trans/select?q=trans:Zeit&hl=
>> >> >> > > true&hl.fl=trans&hl.q=Kundigung&hl.snippets=3&wt=xml&rows=1
>> >> >> > > >
>> >> >> > > > http://localhost:8983/solr/trans/select?q=trans:Zeit&hl=
>> >> >> > > true&hl.fl=trans&hl.q=K%C3%BCndigung&hl.snippets=3&wt=xml&rows=1
>> >> >> > > > <http://localhost:8983/solr/trans/select?q=trans:Zeit&hl=
>> >> >> > > true&hl.fl=trans&hl.q=Kundigung&hl.snippets=3&wt=xml&rows=1>
>> >> >> > > >
>> >> >> > > > Managed-schema:
>> >> >> > > >
>> >> >> > > >   <fieldType name="text_de" class="solr.TextField"
>> >> >> > > positionIncrementGap="100">
>> >> >> > > >     <analyzer>
>> >> >> > > >       <tokenizer class="solr.StandardTokenizerFactory"/>
>> >> >> > > >       <filter class="solr.LowerCaseFilterFactory"/>
>> >> >> > > >       <filter class="solr.StopFilterFactory" format="snowball"
>> >> >> > > > words="lang/stopwords_de.txt" ignoreCase="true"/>
>> >> >> > > >       <filter class="solr.GermanNormalizationFilterFactory"/>
>> >> >> > > >       <filter class="solr.GermanLightStemFilterFactory"/>
>> >> >> > > >     </analyzer>
>> >> >> > > >   </fieldType>
>> >> >> > > >
>> >> >> > > >
>> >> >> > > > Other additional infos:
>> >> >> > > > https://stackoverflow.com/questions/49276093/solr-
>> >> >> > > highlighting-terms-with-umlaut-not-found-not-highlighted
>> >> >> > > >
>> >> >> > > > Cheers,
>> >> >> > > > Arturas
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>>

Reply via email to