Huy Le created LUCENE-8800:
------------------------------

             Summary: FieldsReader#terms poor performance on a index with many 
fields
                 Key: LUCENE-8800
                 URL: https://issues.apache.org/jira/browse/LUCENE-8800
             Project: Lucene - Core
          Issue Type: Improvement
          Components: core/codecs
    Affects Versions: 8.0
            Reporter: Huy Le
         Attachments: Screen Shot 2019-05-15 at 5.08.26 pm.png

We have experienced poor performance on an index with many fields, their names 
share common prefix. Sampling stack using jprofiler showed a hotspot on methodĀ 
FieldsReader#terms.

!Screen Shot 2019-05-15 at 5.08.26 pm.png!

Looking at source code I have seen thatĀ TreeMap is used to map between field 
name to  FieldsProducer which means a lookup incurs O(logN) comparisons. 

{code:java}
private static class FieldsReader extends FieldsProducer {
    ...    
    private final Map<String,FieldsProducer> fields = new TreeMap<>();
    ...
    @Override
    public Terms terms(String field) throws IOException {
      FieldsProducer fieldsProducer = fields.get(field);
      return fieldsProducer == null ? null : fieldsProducer.terms(field);
    }
{code}

The problem becomes much worse when field names are long and share common 
prefix because each comparison has to iterate over an entire string.
In our case, the index has around 6000 fields in form of customfield_*.  I 
wonder if we can change the TreeMap to HashMap or LinkedHashMap in case we want 
to preserve the sorted order to improve the situation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to