[jira] [Commented] (SOLR-10321) Unified highlighter returns empty fields when using glob
[ https://issues.apache.org/jira/browse/SOLR-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085958#comment-16085958 ] David Smiley commented on SOLR-10321: - I think omitting blank highlights for wildcard specified fields is probably the way to go, and would be minor enough as to not warrant a request flag/param (configuration-itis). One aspect of doing this is creating a HashSet of hl.fl values (up front) and then after highlighting, testing if the field to highlight is in the set or not. If it isn't, then there's a wildcard somewhere. Patches welcome :-) Note that doing wildcard highlights on tons of fields, assuming hl.requiredFieldMatch=false, assuming analysis offset source, is probably relatively slow in and of itself, aside from the excessive noise of putting useless empty entries in the Solr response. The underlying UnifiedHighlighter will loop over each field to produce a separate FieldHighlighter which separately analyzes the query to pull out pertinent terms and do other initialization. For a setup like this, it's all redundant duplicated work per field. This could probably be addressed at the UnifiedSolrHighlighter level but it'd be awkward, and may actually ideally need some support at the Lucene layer too. It would probably have limitations such that a wildcard highlighted field would then not support per-field config options. > Unified highlighter returns empty fields when using glob > > > Key: SOLR-10321 > URL: https://issues.apache.org/jira/browse/SOLR-10321 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: highlighter >Affects Versions: 6.4.2 >Reporter: Markus Jelsma >Priority: Minor > Fix For: 7.0 > > > {code} > q=lama=unified=content_* > {code} > returns: > {code} >name="http://www.nu.nl/weekend/3771311/dalai-lama-inspireert-westen.html;> > > > Nobelprijs Voorafgaand aan zijn bezoek aan Nederland is de dalai > emlama/em in Noorwegen om te vieren dat 25 jaar geleden de > Nobelprijs voor de Vrede aan hem werd toegekend. Anders dan in Nederland > wordt de dalai emlama/em niet ontvangen in het Noorse > parlement. > > > > > > > > > > > > {code} > FastVector and original do not emit: > {code} > > > > > > > > > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10321) Unified highlighter returns empty fields when using glob
[ https://issues.apache.org/jira/browse/SOLR-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082299#comment-16082299 ] Christoph Hack commented on SOLR-10321: --- I think not sending empty entries at all (even if there is a field in the document) might be a good option, since transferring and decoding the keys can take a considerable amount of time. It's always possible to look at the retrieved document to see if the field is available or not. Unfortunately, changing the default might break some clients that are currently depending on this behavior and I am not sure if it's worth breaking them (and forcing them to fix a potential performance problem). The other option would be to introduce yet another highlighting option. > Unified highlighter returns empty fields when using glob > > > Key: SOLR-10321 > URL: https://issues.apache.org/jira/browse/SOLR-10321 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: highlighter >Affects Versions: 6.4.2 >Reporter: Markus Jelsma >Priority: Minor > Fix For: 7.0 > > > {code} > q=lama=unified=content_* > {code} > returns: > {code} >name="http://www.nu.nl/weekend/3771311/dalai-lama-inspireert-westen.html;> > > > Nobelprijs Voorafgaand aan zijn bezoek aan Nederland is de dalai > emlama/em in Noorwegen om te vieren dat 25 jaar geleden de > Nobelprijs voor de Vrede aan hem werd toegekend. Anders dan in Nederland > wordt de dalai emlama/em niet ontvangen in het Noorse > parlement. > > > > > > > > > > > > {code} > FastVector and original do not emit: > {code} > > > > > > > > > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10321) Unified highlighter returns empty fields when using glob
[ https://issues.apache.org/jira/browse/SOLR-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082276#comment-16082276 ] David Smiley commented on SOLR-10321: - If there are a great number of fields to potentially highlight (even though each doc will match very few), this could also be a performance issue as reported in SOLR-10993 (a duplicate of this issue really). > Unified highlighter returns empty fields when using glob > > > Key: SOLR-10321 > URL: https://issues.apache.org/jira/browse/SOLR-10321 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: highlighter >Affects Versions: 6.4.2 >Reporter: Markus Jelsma >Priority: Minor > Fix For: 7.0 > > > {code} > q=lama=unified=content_* > {code} > returns: > {code} >name="http://www.nu.nl/weekend/3771311/dalai-lama-inspireert-westen.html;> > > > Nobelprijs Voorafgaand aan zijn bezoek aan Nederland is de dalai > emlama/em in Noorwegen om te vieren dat 25 jaar geleden de > Nobelprijs voor de Vrede aan hem werd toegekend. Anders dan in Nederland > wordt de dalai emlama/em niet ontvangen in het Noorse > parlement. > > > > > > > > > > > > {code} > FastVector and original do not emit: > {code} > > > > > > > > > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10321) Unified highlighter returns empty fields when using glob
[ https://issues.apache.org/jira/browse/SOLR-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932852#comment-15932852 ] David Smiley commented on SOLR-10321: - To avoid perhaps excessive configuration, perhaps this logic should always take effect but limited to those fields specified via a wildcard. > Unified highlighter returns empty fields when using glob > > > Key: SOLR-10321 > URL: https://issues.apache.org/jira/browse/SOLR-10321 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: highlighter >Affects Versions: 6.4.2 >Reporter: Markus Jelsma >Priority: Minor > Fix For: master (7.0) > > > {code} > q=lama=unified=content_* > {code} > returns: > {code} >name="http://www.nu.nl/weekend/3771311/dalai-lama-inspireert-westen.html;> > > > Nobelprijs Voorafgaand aan zijn bezoek aan Nederland is de dalai > emlama/em in Noorwegen om te vieren dat 25 jaar geleden de > Nobelprijs voor de Vrede aan hem werd toegekend. Anders dan in Nederland > wordt de dalai emlama/em niet ontvangen in het Noorse > parlement. > > > > > > > > > > > > {code} > FastVector and original do not emit: > {code} > > > > > > > > > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10321) Unified highlighter returns empty fields when using glob
[ https://issues.apache.org/jira/browse/SOLR-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932842#comment-15932842 ] Markus Jelsma commented on SOLR-10321: -- Hi - it is indeed just a nuisanse so far. Although it will get worse if we receive data of more different languages. It seems to depend on what fields have data, not the fields that are defined in the schema. I think it would be handy not to add empty arrs to the named list somewhere. > Unified highlighter returns empty fields when using glob > > > Key: SOLR-10321 > URL: https://issues.apache.org/jira/browse/SOLR-10321 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: highlighter >Affects Versions: 6.4.2 >Reporter: Markus Jelsma >Priority: Minor > Fix For: master (7.0) > > > {code} > q=lama=unified=content_* > {code} > returns: > {code} >name="http://www.nu.nl/weekend/3771311/dalai-lama-inspireert-westen.html;> > > > Nobelprijs Voorafgaand aan zijn bezoek aan Nederland is de dalai > emlama/em in Noorwegen om te vieren dat 25 jaar geleden de > Nobelprijs voor de Vrede aan hem werd toegekend. Anders dan in Nederland > wordt de dalai emlama/em niet ontvangen in het Noorse > parlement. > > > > > > > > > > > > {code} > FastVector and original do not emit: > {code} > > > > > > > > > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10321) Unified highlighter returns empty fields when using glob
[ https://issues.apache.org/jira/browse/SOLR-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932832#comment-15932832 ] David Smiley commented on SOLR-10321: - Ah; ok. Solving this would be a bit tricky due to the division of responsibility between the UnifiedHighlighter (Lucene) and UnifiedSolrHighlighter. We would need to shim in an intermediate StoredVieldVisitor to track which fields are seen vs not at all, and store it on Solr's UH subclass instance. That doesn't seem worth it to me, especially just for wildcards in "hl.fl". What might make sense (to me) is a new boolean option to have UnifiedSolrHighlighter omit returning empty-string highlights altogether. Although enabling that would conceal the distinction of the inability to find any snippet from stored text, vs not having any stored text to even highlight. I dunno. I assume this is just a minor nuisance for you Markus? > Unified highlighter returns empty fields when using glob > > > Key: SOLR-10321 > URL: https://issues.apache.org/jira/browse/SOLR-10321 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: highlighter >Affects Versions: 6.4.2 >Reporter: Markus Jelsma >Priority: Minor > Fix For: master (7.0) > > > {code} > q=lama=unified=content_* > {code} > returns: > {code} >name="http://www.nu.nl/weekend/3771311/dalai-lama-inspireert-westen.html;> > > > Nobelprijs Voorafgaand aan zijn bezoek aan Nederland is de dalai > emlama/em in Noorwegen om te vieren dat 25 jaar geleden de > Nobelprijs voor de Vrede aan hem werd toegekend. Anders dan in Nederland > wordt de dalai emlama/em niet ontvangen in het Noorse > parlement. > > > > > > > > > > > > {code} > FastVector and original do not emit: > {code} > > > > > > > > > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org