Huy Le created LUCENE-8800: ------------------------------ Summary: FieldsReader#terms poor performance on a index with many fields Key: LUCENE-8800 URL: https://issues.apache.org/jira/browse/LUCENE-8800 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Affects Versions: 8.0 Reporter: Huy Le Attachments: Screen Shot 2019-05-15 at 5.08.26 pm.png
We have experienced poor performance on an index with many fields, their names share common prefix. Sampling stack using jprofiler showed a hotspot on methodĀ FieldsReader#terms. !Screen Shot 2019-05-15 at 5.08.26 pm.png! Looking at source code I have seen thatĀ TreeMap is used to map between field name to FieldsProducer which means a lookup incurs O(logN) comparisons. {code:java} private static class FieldsReader extends FieldsProducer { ... private final Map<String,FieldsProducer> fields = new TreeMap<>(); ... @Override public Terms terms(String field) throws IOException { FieldsProducer fieldsProducer = fields.get(field); return fieldsProducer == null ? null : fieldsProducer.terms(field); } {code} The problem becomes much worse when field names are long and share common prefix because each comparison has to iterate over an entire string. In our case, the index has around 6000 fields in form of customfield_*. I wonder if we can change the TreeMap to HashMap or LinkedHashMap in case we want to preserve the sorted order to improve the situation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org