[jira] [Commented] (SOLR-10321) Unified highlighter returns empty fields when using glob

2017-07-13 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085958#comment-16085958
 ] 

David Smiley commented on SOLR-10321:
-

I think omitting blank highlights for wildcard specified fields is probably the 
way to go, and would be minor enough as to not warrant a request flag/param 
(configuration-itis).  One aspect of doing this is creating a HashSet of hl.fl 
values (up front) and then after highlighting, testing if the field to 
highlight is in the set or not.  If it isn't, then there's a wildcard 
somewhere.  Patches welcome :-)

Note that doing wildcard highlights on tons of fields, assuming 
hl.requiredFieldMatch=false, assuming analysis offset source, is probably 
relatively slow in and of itself, aside from the excessive noise of putting 
useless empty entries in the Solr response.  The underlying UnifiedHighlighter 
will loop over each field to produce a separate FieldHighlighter which 
separately analyzes the query to pull out pertinent terms and do other 
initialization.  For a setup like this, it's all redundant duplicated work per 
field.  This could probably be addressed at the UnifiedSolrHighlighter level 
but it'd be awkward, and may actually ideally need some support at the Lucene 
layer too.  It would probably have limitations such that a wildcard highlighted 
field would then not support per-field config options.

> Unified highlighter returns empty fields when using glob
> 
>
> Key: SOLR-10321
> URL: https://issues.apache.org/jira/browse/SOLR-10321
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: highlighter
>Affects Versions: 6.4.2
>Reporter: Markus Jelsma
>Priority: Minor
> Fix For: 7.0
>
>
> {code}
> q=lama=unified=content_*
> {code}
> returns:
> {code}
>name="http://www.nu.nl/weekend/3771311/dalai-lama-inspireert-westen.html;>
> 
> 
>   Nobelprijs Voorafgaand aan zijn bezoek aan Nederland is de dalai 
> emlama/em in Noorwegen om te vieren dat 25 jaar geleden de 
> Nobelprijs voor de Vrede aan hem werd toegekend. Anders dan in Nederland 
> wordt de dalai emlama/em niet ontvangen in het Noorse 
> parlement. 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>   
> {code}
> FastVector and original do not emit: 
> {code}
> 
> 
> 
> 
> 
> 
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10321) Unified highlighter returns empty fields when using glob

2017-07-11 Thread Christoph Hack (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082299#comment-16082299
 ] 

Christoph Hack commented on SOLR-10321:
---

I think not sending empty entries at all (even if there is a field in the 
document) might be a good option, since transferring and decoding the keys can 
take a considerable amount of time. It's always possible to look at the 
retrieved document to see if the field is available or not. Unfortunately, 
changing the default might break some clients that are currently depending on 
this behavior and I am not sure if it's worth breaking them (and forcing them 
to fix a potential performance problem). The other option would be to introduce 
yet another highlighting option.

> Unified highlighter returns empty fields when using glob
> 
>
> Key: SOLR-10321
> URL: https://issues.apache.org/jira/browse/SOLR-10321
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: highlighter
>Affects Versions: 6.4.2
>Reporter: Markus Jelsma
>Priority: Minor
> Fix For: 7.0
>
>
> {code}
> q=lama=unified=content_*
> {code}
> returns:
> {code}
>name="http://www.nu.nl/weekend/3771311/dalai-lama-inspireert-westen.html;>
> 
> 
>   Nobelprijs Voorafgaand aan zijn bezoek aan Nederland is de dalai 
> emlama/em in Noorwegen om te vieren dat 25 jaar geleden de 
> Nobelprijs voor de Vrede aan hem werd toegekend. Anders dan in Nederland 
> wordt de dalai emlama/em niet ontvangen in het Noorse 
> parlement. 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>   
> {code}
> FastVector and original do not emit: 
> {code}
> 
> 
> 
> 
> 
> 
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10321) Unified highlighter returns empty fields when using glob

2017-07-11 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082276#comment-16082276
 ] 

David Smiley commented on SOLR-10321:
-

If there are a great number of fields to potentially highlight (even though 
each doc will match very few), this could also be a performance issue as 
reported in SOLR-10993 (a duplicate of this issue really).

> Unified highlighter returns empty fields when using glob
> 
>
> Key: SOLR-10321
> URL: https://issues.apache.org/jira/browse/SOLR-10321
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: highlighter
>Affects Versions: 6.4.2
>Reporter: Markus Jelsma
>Priority: Minor
> Fix For: 7.0
>
>
> {code}
> q=lama=unified=content_*
> {code}
> returns:
> {code}
>name="http://www.nu.nl/weekend/3771311/dalai-lama-inspireert-westen.html;>
> 
> 
>   Nobelprijs Voorafgaand aan zijn bezoek aan Nederland is de dalai 
> emlama/em in Noorwegen om te vieren dat 25 jaar geleden de 
> Nobelprijs voor de Vrede aan hem werd toegekend. Anders dan in Nederland 
> wordt de dalai emlama/em niet ontvangen in het Noorse 
> parlement. 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>   
> {code}
> FastVector and original do not emit: 
> {code}
> 
> 
> 
> 
> 
> 
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10321) Unified highlighter returns empty fields when using glob

2017-03-20 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932852#comment-15932852
 ] 

David Smiley commented on SOLR-10321:
-

To avoid perhaps excessive configuration, perhaps this logic should always take 
effect but limited to those fields specified via a wildcard.

> Unified highlighter returns empty fields when using glob
> 
>
> Key: SOLR-10321
> URL: https://issues.apache.org/jira/browse/SOLR-10321
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: highlighter
>Affects Versions: 6.4.2
>Reporter: Markus Jelsma
>Priority: Minor
> Fix For: master (7.0)
>
>
> {code}
> q=lama=unified=content_*
> {code}
> returns:
> {code}
>name="http://www.nu.nl/weekend/3771311/dalai-lama-inspireert-westen.html;>
> 
> 
>   Nobelprijs Voorafgaand aan zijn bezoek aan Nederland is de dalai 
> emlama/em in Noorwegen om te vieren dat 25 jaar geleden de 
> Nobelprijs voor de Vrede aan hem werd toegekend. Anders dan in Nederland 
> wordt de dalai emlama/em niet ontvangen in het Noorse 
> parlement. 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>   
> {code}
> FastVector and original do not emit: 
> {code}
> 
> 
> 
> 
> 
> 
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10321) Unified highlighter returns empty fields when using glob

2017-03-20 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932842#comment-15932842
 ] 

Markus Jelsma commented on SOLR-10321:
--

Hi - it is indeed just a nuisanse so far. Although it will get worse if we 
receive data of more different languages. It seems to depend on what fields 
have data, not the fields that are defined in the schema.

I think it would be handy not to add empty arrs to the named list somewhere.

> Unified highlighter returns empty fields when using glob
> 
>
> Key: SOLR-10321
> URL: https://issues.apache.org/jira/browse/SOLR-10321
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: highlighter
>Affects Versions: 6.4.2
>Reporter: Markus Jelsma
>Priority: Minor
> Fix For: master (7.0)
>
>
> {code}
> q=lama=unified=content_*
> {code}
> returns:
> {code}
>name="http://www.nu.nl/weekend/3771311/dalai-lama-inspireert-westen.html;>
> 
> 
>   Nobelprijs Voorafgaand aan zijn bezoek aan Nederland is de dalai 
> emlama/em in Noorwegen om te vieren dat 25 jaar geleden de 
> Nobelprijs voor de Vrede aan hem werd toegekend. Anders dan in Nederland 
> wordt de dalai emlama/em niet ontvangen in het Noorse 
> parlement. 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>   
> {code}
> FastVector and original do not emit: 
> {code}
> 
> 
> 
> 
> 
> 
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10321) Unified highlighter returns empty fields when using glob

2017-03-20 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932832#comment-15932832
 ] 

David Smiley commented on SOLR-10321:
-

Ah; ok.  Solving this would be a bit tricky due to the division of 
responsibility between the UnifiedHighlighter (Lucene) and 
UnifiedSolrHighlighter.  We would need to shim in an intermediate 
StoredVieldVisitor to track which fields are seen vs not at all, and store it 
on Solr's UH subclass instance.  That doesn't seem worth it to me, especially 
just for wildcards in "hl.fl".

What might make sense (to me) is a new boolean option to have 
UnifiedSolrHighlighter omit returning empty-string highlights altogether.  
Although enabling that would conceal the distinction of the inability to find 
any snippet from stored text, vs not having any stored text to even highlight.  
I dunno.

I assume this is just a minor nuisance for you Markus?

> Unified highlighter returns empty fields when using glob
> 
>
> Key: SOLR-10321
> URL: https://issues.apache.org/jira/browse/SOLR-10321
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: highlighter
>Affects Versions: 6.4.2
>Reporter: Markus Jelsma
>Priority: Minor
> Fix For: master (7.0)
>
>
> {code}
> q=lama=unified=content_*
> {code}
> returns:
> {code}
>name="http://www.nu.nl/weekend/3771311/dalai-lama-inspireert-westen.html;>
> 
> 
>   Nobelprijs Voorafgaand aan zijn bezoek aan Nederland is de dalai 
> emlama/em in Noorwegen om te vieren dat 25 jaar geleden de 
> Nobelprijs voor de Vrede aan hem werd toegekend. Anders dan in Nederland 
> wordt de dalai emlama/em niet ontvangen in het Noorse 
> parlement. 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>   
> {code}
> FastVector and original do not emit: 
> {code}
> 
> 
> 
> 
> 
> 
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org