[jira] [Commented] (SOLR-10321) Unified highlighter returns empty fields when using glob
[ https://issues.apache.org/jira/browse/SOLR-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082299#comment-16082299 ] Christoph Hack commented on SOLR-10321: --- I think not sending empty entries at all (even if there is a field in the document) might be a good option, since transferring and decoding the keys can take a considerable amount of time. It's always possible to look at the retrieved document to see if the field is available or not. Unfortunately, changing the default might break some clients that are currently depending on this behavior and I am not sure if it's worth breaking them (and forcing them to fix a potential performance problem). The other option would be to introduce yet another highlighting option. > Unified highlighter returns empty fields when using glob > > > Key: SOLR-10321 > URL: https://issues.apache.org/jira/browse/SOLR-10321 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: highlighter >Affects Versions: 6.4.2 >Reporter: Markus Jelsma >Priority: Minor > Fix For: 7.0 > > > {code} > q=lama&hl.method=unified&hl.fl=content_* > {code} > returns: > {code} >name="http://www.nu.nl/weekend/3771311/dalai-lama-inspireert-westen.html";> > > > Nobelprijs Voorafgaand aan zijn bezoek aan Nederland is de dalai > lama in Noorwegen om te vieren dat 25 jaar geleden de > Nobelprijs voor de Vrede aan hem werd toegekend. Anders dan in Nederland > wordt de dalai lama niet ontvangen in het Noorse > parlement. > > > > > > > > > > > > {code} > FastVector and original do not emit: > {code} > > > > > > > > > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-10993) lots of empty highlight entries
[ https://issues.apache.org/jira/browse/SOLR-10993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082268#comment-16082268 ] Christoph Hack edited comment on SOLR-10993 at 7/11/17 2:25 PM: Thanks for your reply, but I am not asking a question... I have already looked at the source and have confirmed that it is a bug, as I have written before. Here is a simple example to reconstruct the behavior: 1. Create a new core "bin/solr create -c bug" 2. Index some documents: {code|title=Example Data} {"id": "D1", "prop01_txt": "foo", "prop03_txt": "foo"} {"id": "D2", "prop02_txt": "foo", "prop04_txt": "foo"} {"id": "D3", "prop02_txt": "foo", "prop05_txt": "foo"} {"id": "D4", "prop03_txt": "foo", "prop06_txt": "foo"} {"id": "D5", "prop03_txt": "foo", "prop07_txt": "foo"} {code} 3. Query the database with the unified highlighter: {code:title=Query} http://localhost:8983/solr/bug/select?hl.fl=prop*_txt&hl.method=unified&hl=on&indent=on&q=foo&wt=json {code} {code:title=Response} { "responseHeader":{ "status":0, "QTime":20, "params":{ "q":"foo", "hl":"on", "indent":"on", "hl.fl":"prop*_txt", "hl.method":"unified", "wt":"json"}}, "response":{"numFound":5,"start":0,"docs":[ { "id":"D1", "prop01_txt":["foo"], "prop03_txt":["foo"], "_version_":1572635524573691904}, { "id":"D2", "prop02_txt":["foo"], "prop04_txt":["foo"], "_version_":1572635532961251328}, { "id":"D3", "prop02_txt":["foo"], "prop05_txt":["foo"], "_version_":1572635545661603840}, { "id":"D4", "prop03_txt":["foo"], "prop06_txt":["foo"], "_version_":1572635551479103488}, { "id":"D5", "prop03_txt":["foo"], "prop07_txt":["foo"], "_version_":1572635557318623232}] }, "highlighting":{ "D1":{ "prop01_txt":["foo"], "prop03_txt":["foo"], "prop02_txt":[], "prop04_txt":[], "prop05_txt":[], "prop06_txt":[], "prop07_txt":[]}, "D2":{ "prop01_txt":[], "prop03_txt":[], "prop02_txt":["foo"], "prop04_txt":["foo"], "prop05_txt":[], "prop06_txt":[], "prop07_txt":[]}, "D3":{ "prop01_txt":[], "prop03_txt":[], "prop02_txt":["foo"], "prop04_txt":[], "prop05_txt":["foo"], "prop06_txt":[], "prop07_txt":[]}, "D4":{ "prop01_txt":[], "prop03_txt":["foo"], "prop02_txt":[], "prop04_txt":[], "prop05_txt":[], "prop06_txt":["foo"], "prop07_txt":[]}, "D5":{ "prop01_txt":[], "prop03_txt":["foo"], "prop02_txt":[], "prop04_txt":[], "prop05_txt":[], "prop06_txt":[], "prop07_txt":["foo"]}}} {code} As you can see, the highlighting response contains far too many entries. In my example, I get about 10k entries per result item which is painfully slow. was (Author: tux21b): Thanks for your reply, but I am not asking a question... I have already looked at the source and have confirmed that it is a bug, as I have written before. Here is a simple example to reconstruct the behavior: 1. Create a new core "bin/solr create -c bug" 2. Index some documents: {code:json|title=Example Data} {"id": "D1", "prop01_txt": "foo", "prop03_txt": "foo"} {"id": "D2", "prop02_txt": "foo", "prop04_txt": "foo"} {"id": "D3", "prop02_txt": "foo", "prop05_txt": "foo"} {"id": "D4", "prop03_txt": "foo", "prop06_txt": "foo"} {"id": "D5", "prop03_txt": "foo", "prop07_txt": "foo"} {code} 3. Query the database with the unified highlighter: {code|title=Query} http://localhost:8983/solr/bug/select?hl.fl=prop*_txt&hl.method=unified&hl=on&indent=on&q=foo&wt=json {code} {code:json|title=Response} { "responseHeader":{ "status":0, "QTime":20, "params":{ "q":"foo", "hl":"on", "indent":"on", "hl.fl":"prop*_txt", "hl.method":"unified", "wt":"json"}}, "response":{"numFound":5,"start":0,"docs":[ { "id":"D1", "prop01_txt":["foo"], "prop03_txt":["foo"], "_version_":1572635524573691904}, { "id":"D2", "prop02_txt":["foo"], "prop04_txt":["foo"], "_version_":1572635532961251328}, { "id":"D3", "prop02_txt":["foo"], "prop05_txt":["foo"], "_version_":1572635545661603840}, { "id":"D4", "prop03_txt":["foo"], "prop06_txt":["foo"], "_version_":1572635551479103488}, { "id":"D5", "prop03_txt":["foo"], "prop07_txt":["foo"], "_version_":1572635557318623232}] }, "highlighting":{ "D1":{ "prop01_txt":["foo"], "prop03_txt":["foo"], "prop02_txt":[], "prop04_txt":[], "prop05_txt":[], "prop0
[jira] [Resolved] (SOLR-10993) lots of empty highlight entries
[ https://issues.apache.org/jira/browse/SOLR-10993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Hack resolved SOLR-10993. --- Resolution: Duplicate Ah, many thanks David. I haven't seen that issue before, sorry. > lots of empty highlight entries > --- > > Key: SOLR-10993 > URL: https://issues.apache.org/jira/browse/SOLR-10993 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: highlighter >Affects Versions: 6.6 >Reporter: Christoph Hack > > I have indexed documents with lots of different text fields representing > different properties in Solr (version 6.6). Those text fields are indexed > with storeOffsetsWithPositions=true and termVectors=true to speed up > highlighting using the UnifiedHighlighter. > During a search, i would like to highlight those properties and I have set > hl.fl to wildcard match all properties. Everything is working fine, except > that the responses are huge. > Every document only has a small set of properties (let's say 10 in total, > with 1-2 matching ones), but Solr returns in the highlighting section, a > dictionary with every possible property (about 10k) for every item. Nearly > all of the entries are empty, but decoding the keys of the map takes a > considerable amount of time. > In fact, the time spent decoding this unnecessary entries is enormous. Solr > takes about 174ms for the search + encoding (i expect that the timing could > be much better) and decoding the response in Go (using the default JSON > package from the standard library) takes 695ms. > I guess the offending line is somewhere around: > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/highlight/UnifiedSolrHighlighter.java#L175 > Why is Solr generating map entries for missing values in the first place? > The question had been posted on stackoverflow before: > https://stackoverflow.com/questions/44846220/solr-huge-and-slow-highlighting-response-with-mostly-empty-fields -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (SOLR-10993) lots of empty highlight entries
[ https://issues.apache.org/jira/browse/SOLR-10993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Hack reopened SOLR-10993: --- > lots of empty highlight entries > --- > > Key: SOLR-10993 > URL: https://issues.apache.org/jira/browse/SOLR-10993 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: highlighter >Affects Versions: 6.6 >Reporter: Christoph Hack > > I have indexed documents with lots of different text fields representing > different properties in Solr (version 6.6). Those text fields are indexed > with storeOffsetsWithPositions=true and termVectors=true to speed up > highlighting using the UnifiedHighlighter. > During a search, i would like to highlight those properties and I have set > hl.fl to wildcard match all properties. Everything is working fine, except > that the responses are huge. > Every document only has a small set of properties (let's say 10 in total, > with 1-2 matching ones), but Solr returns in the highlighting section, a > dictionary with every possible property (about 10k) for every item. Nearly > all of the entries are empty, but decoding the keys of the map takes a > considerable amount of time. > In fact, the time spent decoding this unnecessary entries is enormous. Solr > takes about 174ms for the search + encoding (i expect that the timing could > be much better) and decoding the response in Go (using the default JSON > package from the standard library) takes 695ms. > I guess the offending line is somewhere around: > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/highlight/UnifiedSolrHighlighter.java#L175 > Why is Solr generating map entries for missing values in the first place? > The question had been posted on stackoverflow before: > https://stackoverflow.com/questions/44846220/solr-huge-and-slow-highlighting-response-with-mostly-empty-fields -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-10993) lots of empty highlight entries
[ https://issues.apache.org/jira/browse/SOLR-10993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082268#comment-16082268 ] Christoph Hack edited comment on SOLR-10993 at 7/11/17 2:26 PM: Thanks for your reply, but I am not asking a question... I have already looked at the source and have confirmed that it is a bug, as I have written before. Here is a simple example to reconstruct the behavior: 1. Create a new core "bin/solr create -c bug" 2. Index some documents: {code:title=Example Data} {"id": "D1", "prop01_txt": "foo", "prop03_txt": "foo"} {"id": "D2", "prop02_txt": "foo", "prop04_txt": "foo"} {"id": "D3", "prop02_txt": "foo", "prop05_txt": "foo"} {"id": "D4", "prop03_txt": "foo", "prop06_txt": "foo"} {"id": "D5", "prop03_txt": "foo", "prop07_txt": "foo"} {code} 3. Query the database with the unified highlighter: {code:title=Query} http://localhost:8983/solr/bug/select?hl.fl=prop*_txt&hl.method=unified&hl=on&indent=on&q=foo&wt=json {code} {code:title=Response} { "responseHeader":{ "status":0, "QTime":20, "params":{ "q":"foo", "hl":"on", "indent":"on", "hl.fl":"prop*_txt", "hl.method":"unified", "wt":"json"}}, "response":{"numFound":5,"start":0,"docs":[ { "id":"D1", "prop01_txt":["foo"], "prop03_txt":["foo"], "_version_":1572635524573691904}, { "id":"D2", "prop02_txt":["foo"], "prop04_txt":["foo"], "_version_":1572635532961251328}, { "id":"D3", "prop02_txt":["foo"], "prop05_txt":["foo"], "_version_":1572635545661603840}, { "id":"D4", "prop03_txt":["foo"], "prop06_txt":["foo"], "_version_":1572635551479103488}, { "id":"D5", "prop03_txt":["foo"], "prop07_txt":["foo"], "_version_":1572635557318623232}] }, "highlighting":{ "D1":{ "prop01_txt":["foo"], "prop03_txt":["foo"], "prop02_txt":[], "prop04_txt":[], "prop05_txt":[], "prop06_txt":[], "prop07_txt":[]}, "D2":{ "prop01_txt":[], "prop03_txt":[], "prop02_txt":["foo"], "prop04_txt":["foo"], "prop05_txt":[], "prop06_txt":[], "prop07_txt":[]}, "D3":{ "prop01_txt":[], "prop03_txt":[], "prop02_txt":["foo"], "prop04_txt":[], "prop05_txt":["foo"], "prop06_txt":[], "prop07_txt":[]}, "D4":{ "prop01_txt":[], "prop03_txt":["foo"], "prop02_txt":[], "prop04_txt":[], "prop05_txt":[], "prop06_txt":["foo"], "prop07_txt":[]}, "D5":{ "prop01_txt":[], "prop03_txt":["foo"], "prop02_txt":[], "prop04_txt":[], "prop05_txt":[], "prop06_txt":[], "prop07_txt":["foo"]}}} {code} As you can see, the highlighting response contains far too many entries. In my example, I get about 10k entries per result item which is painfully slow. was (Author: tux21b): Thanks for your reply, but I am not asking a question... I have already looked at the source and have confirmed that it is a bug, as I have written before. Here is a simple example to reconstruct the behavior: 1. Create a new core "bin/solr create -c bug" 2. Index some documents: {code|title=Example Data} {"id": "D1", "prop01_txt": "foo", "prop03_txt": "foo"} {"id": "D2", "prop02_txt": "foo", "prop04_txt": "foo"} {"id": "D3", "prop02_txt": "foo", "prop05_txt": "foo"} {"id": "D4", "prop03_txt": "foo", "prop06_txt": "foo"} {"id": "D5", "prop03_txt": "foo", "prop07_txt": "foo"} {code} 3. Query the database with the unified highlighter: {code:title=Query} http://localhost:8983/solr/bug/select?hl.fl=prop*_txt&hl.method=unified&hl=on&indent=on&q=foo&wt=json {code} {code:title=Response} { "responseHeader":{ "status":0, "QTime":20, "params":{ "q":"foo", "hl":"on", "indent":"on", "hl.fl":"prop*_txt", "hl.method":"unified", "wt":"json"}}, "response":{"numFound":5,"start":0,"docs":[ { "id":"D1", "prop01_txt":["foo"], "prop03_txt":["foo"], "_version_":1572635524573691904}, { "id":"D2", "prop02_txt":["foo"], "prop04_txt":["foo"], "_version_":1572635532961251328}, { "id":"D3", "prop02_txt":["foo"], "prop05_txt":["foo"], "_version_":1572635545661603840}, { "id":"D4", "prop03_txt":["foo"], "prop06_txt":["foo"], "_version_":1572635551479103488}, { "id":"D5", "prop03_txt":["foo"], "prop07_txt":["foo"], "_version_":1572635557318623232}] }, "highlighting":{ "D1":{ "prop01_txt":["foo"], "prop03_txt":["foo"], "prop02_txt":[], "prop04_txt":[], "prop05_txt":[], "prop06_txt":[],
[jira] [Commented] (SOLR-10993) lots of empty highlight entries
[ https://issues.apache.org/jira/browse/SOLR-10993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082268#comment-16082268 ] Christoph Hack commented on SOLR-10993: --- Thanks for your reply, but I am not asking a question... I have already looked at the source and have confirmed that it is a bug, as I have written before. Here is a simple example to reconstruct the behavior: 1. Create a new core "bin/solr create -c bug" 2. Index some documents: {code:json|title=Example Data} {"id": "D1", "prop01_txt": "foo", "prop03_txt": "foo"} {"id": "D2", "prop02_txt": "foo", "prop04_txt": "foo"} {"id": "D3", "prop02_txt": "foo", "prop05_txt": "foo"} {"id": "D4", "prop03_txt": "foo", "prop06_txt": "foo"} {"id": "D5", "prop03_txt": "foo", "prop07_txt": "foo"} {code} 3. Query the database with the unified highlighter: {code|title=Query} http://localhost:8983/solr/bug/select?hl.fl=prop*_txt&hl.method=unified&hl=on&indent=on&q=foo&wt=json {code} {code:json|title=Response} { "responseHeader":{ "status":0, "QTime":20, "params":{ "q":"foo", "hl":"on", "indent":"on", "hl.fl":"prop*_txt", "hl.method":"unified", "wt":"json"}}, "response":{"numFound":5,"start":0,"docs":[ { "id":"D1", "prop01_txt":["foo"], "prop03_txt":["foo"], "_version_":1572635524573691904}, { "id":"D2", "prop02_txt":["foo"], "prop04_txt":["foo"], "_version_":1572635532961251328}, { "id":"D3", "prop02_txt":["foo"], "prop05_txt":["foo"], "_version_":1572635545661603840}, { "id":"D4", "prop03_txt":["foo"], "prop06_txt":["foo"], "_version_":1572635551479103488}, { "id":"D5", "prop03_txt":["foo"], "prop07_txt":["foo"], "_version_":1572635557318623232}] }, "highlighting":{ "D1":{ "prop01_txt":["foo"], "prop03_txt":["foo"], "prop02_txt":[], "prop04_txt":[], "prop05_txt":[], "prop06_txt":[], "prop07_txt":[]}, "D2":{ "prop01_txt":[], "prop03_txt":[], "prop02_txt":["foo"], "prop04_txt":["foo"], "prop05_txt":[], "prop06_txt":[], "prop07_txt":[]}, "D3":{ "prop01_txt":[], "prop03_txt":[], "prop02_txt":["foo"], "prop04_txt":[], "prop05_txt":["foo"], "prop06_txt":[], "prop07_txt":[]}, "D4":{ "prop01_txt":[], "prop03_txt":["foo"], "prop02_txt":[], "prop04_txt":[], "prop05_txt":[], "prop06_txt":["foo"], "prop07_txt":[]}, "D5":{ "prop01_txt":[], "prop03_txt":["foo"], "prop02_txt":[], "prop04_txt":[], "prop05_txt":[], "prop06_txt":[], "prop07_txt":["foo"]}}} {code} As you can see, the highlighting response contains far too many entries. In my example, I get about 10k entries per result item which is painfully slow. > lots of empty highlight entries > --- > > Key: SOLR-10993 > URL: https://issues.apache.org/jira/browse/SOLR-10993 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: highlighter >Affects Versions: 6.6 >Reporter: Christoph Hack > > I have indexed documents with lots of different text fields representing > different properties in Solr (version 6.6). Those text fields are indexed > with storeOffsetsWithPositions=true and termVectors=true to speed up > highlighting using the UnifiedHighlighter. > During a search, i would like to highlight those properties and I have set > hl.fl to wildcard match all properties. Everything is working fine, except > that the responses are huge. > Every document only has a small set of properties (let's say 10 in total, > with 1-2 matching ones), but Solr returns in the highlighting section, a > dictionary with every possible property (about 10k) for every item. Nearly > all of the entries are empty, but decoding the keys of the map takes a > considerable amount of time. > In fact, the time spent decoding this unnecessary entries is enormous. Solr > takes about 174ms for the search + encoding (i expect that the timing could > be much better) and decoding the response in Go (using the default JSON > package from the standard library) takes 695ms. > I guess the offending line is somewhere around: > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/highlight/UnifiedSolrHighlighter.java#L175 > Why is Solr generating map entries for missing values in the first place? > The question had been posted on stackoverflow before: > https://stackoverflow.com/questions/44846220/solr-huge-and-slow-highlighting-response-with-mostly-empty-fields -- This message was sent b
[jira] [Commented] (SOLR-10993) lots of empty highlight entries
[ https://issues.apache.org/jira/browse/SOLR-10993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081991#comment-16081991 ] Christoph Hack commented on SOLR-10993: --- ping? > lots of empty highlight entries > --- > > Key: SOLR-10993 > URL: https://issues.apache.org/jira/browse/SOLR-10993 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: highlighter >Affects Versions: 6.6 >Reporter: Christoph Hack > > I have indexed documents with lots of different text fields representing > different properties in Solr (version 6.6). Those text fields are indexed > with storeOffsetsWithPositions=true and termVectors=true to speed up > highlighting using the UnifiedHighlighter. > During a search, i would like to highlight those properties and I have set > hl.fl to wildcard match all properties. Everything is working fine, except > that the responses are huge. > Every document only has a small set of properties (let's say 10 in total, > with 1-2 matching ones), but Solr returns in the highlighting section, a > dictionary with every possible property (about 10k) for every item. Nearly > all of the entries are empty, but decoding the keys of the map takes a > considerable amount of time. > In fact, the time spent decoding this unnecessary entries is enormous. Solr > takes about 174ms for the search + encoding (i expect that the timing could > be much better) and decoding the response in Go (using the default JSON > package from the standard library) takes 695ms. > I guess the offending line is somewhere around: > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/highlight/UnifiedSolrHighlighter.java#L175 > Why is Solr generating map entries for missing values in the first place? > The question had been posted on stackoverflow before: > https://stackoverflow.com/questions/44846220/solr-huge-and-slow-highlighting-response-with-mostly-empty-fields -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10993) lots of empty highlight entries
[ https://issues.apache.org/jira/browse/SOLR-10993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Hack updated SOLR-10993: -- Description: I have indexed documents with lots of different text fields representing different properties in Solr (version 6.6). Those text fields are indexed with storeOffsetsWithPositions=true and termVectors=true to speed up highlighting using the UnifiedHighlighter. During a search, i would like to highlight those properties and I have set hl.fl to wildcard match all properties. Everything is working fine, except that the responses are huge. Every document only has a small set of properties (let's say 10 in total, with 1-2 matching ones), but Solr returns in the highlighting section, a dictionary with every possible property (about 10k) for every item. Nearly all of the entries are empty, but decoding the keys of the map takes a considerable amount of time. In fact, the time spent decoding this unnecessary entries is enormous. Solr takes about 174ms for the search + encoding (i expect that the timing could be much better) and decoding the response in Go (using the default JSON package from the standard library) takes 695ms. I guess the offending line is somewhere around: https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/highlight/UnifiedSolrHighlighter.java#L175 Why is Solr generating map entries for missing values in the first place? The question had been posted on stackoverflow before: https://stackoverflow.com/questions/44846220/solr-huge-and-slow-highlighting-response-with-mostly-empty-fields was: I have indexed documents with lots of different text fields representing different properties in Solr (version 6.6). Those text fields are indexed with storeOffsetsWithPositions=true and termVectors=true to speed up highlighting using the UnifiedHighlighter. During a search, i would like to highlight those properties and I have set hl.fl to wildcard match all properties. Everything is working fine, except that the responses are huge. Every document only has a small set of properties (let's say 10 in total, with 1-2 matching ones), but Solr returns in the highlighting section, a dictionary with every possible property (about 10k) for every item. Nearly all of the entries are empty, but decoding the keys of the map takes a considerable amount of time. In fact, the time spent decoding this unnecessary entries is enormous. Solr takes about 174ms for the search + encoding (i expect that the timing could be much better) and decoding the response in Go (using the default JSON package from the standard library) takes 695ms. I guess the offending line is: https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/highlight/UnifiedSolrHighlighter.java#L175 Why is Solr generating map entries for missing values in the first place? The question had been posted on stackoverflow before: https://stackoverflow.com/questions/44846220/solr-huge-and-slow-highlighting-response-with-mostly-empty-fields > lots of empty highlight entries > --- > > Key: SOLR-10993 > URL: https://issues.apache.org/jira/browse/SOLR-10993 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: highlighter >Affects Versions: 6.6 >Reporter: Christoph Hack > > I have indexed documents with lots of different text fields representing > different properties in Solr (version 6.6). Those text fields are indexed > with storeOffsetsWithPositions=true and termVectors=true to speed up > highlighting using the UnifiedHighlighter. > During a search, i would like to highlight those properties and I have set > hl.fl to wildcard match all properties. Everything is working fine, except > that the responses are huge. > Every document only has a small set of properties (let's say 10 in total, > with 1-2 matching ones), but Solr returns in the highlighting section, a > dictionary with every possible property (about 10k) for every item. Nearly > all of the entries are empty, but decoding the keys of the map takes a > considerable amount of time. > In fact, the time spent decoding this unnecessary entries is enormous. Solr > takes about 174ms for the search + encoding (i expect that the timing could > be much better) and decoding the response in Go (using the default JSON > package from the standard library) takes 695ms. > I guess the offending line is somewhere around: > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/highlight/UnifiedSolrHighlighter.java#L175 > Why is Solr generating map entries for missing values in the first place? > The question had been posted on stackoverflow before: > https://stackoverflow.com/questions/44846220/solr-huge-and-sl
[jira] [Created] (SOLR-10993) lots of empty highlight entries
Christoph Hack created SOLR-10993: - Summary: lots of empty highlight entries Key: SOLR-10993 URL: https://issues.apache.org/jira/browse/SOLR-10993 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: highlighter Affects Versions: 6.6 Reporter: Christoph Hack I have indexed documents with lots of different text fields representing different properties in Solr (version 6.6). Those text fields are indexed with storeOffsetsWithPositions=true and termVectors=true to speed up highlighting using the UnifiedHighlighter. During a search, i would like to highlight those properties and I have set hl.fl to wildcard match all properties. Everything is working fine, except that the responses are huge. Every document only has a small set of properties (let's say 10 in total, with 1-2 matching ones), but Solr returns in the highlighting section, a dictionary with every possible property (about 10k) for every item. Nearly all of the entries are empty, but decoding the keys of the map takes a considerable amount of time. In fact, the time spent decoding this unnecessary entries is enormous. Solr takes about 174ms for the search + encoding (i expect that the timing could be much better) and decoding the response in Go (using the default JSON package from the standard library) takes 695ms. I guess the offending line is: https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/highlight/UnifiedSolrHighlighter.java#L175 Why is Solr generating map entries for missing values in the first place? The question had been posted on stackoverflow before: https://stackoverflow.com/questions/44846220/solr-huge-and-slow-highlighting-response-with-mostly-empty-fields -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org