This is an automated email from the ASF dual-hosted git repository. ajantha pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git
The following commit(s) were added to refs/heads/master by this push: new 3800492 [CARBONDATA-3879] Filtering Segmets Optimazation 3800492 is described below commit 38004928dfd7cef3635f401fe00b1c4a31b6ffc4 Author: haomarch <marchp...@126.com> AuthorDate: Thu Aug 6 09:48:34 2020 +0800 [CARBONDATA-3879] Filtering Segmets Optimazation Why is this PR needed? During filter segments flow, there are a lot of LIST.CONTAINS, which has heavy time overhead when there are tens of thousands segments. For example, if there are 50000 segments. it will trigger LIST.CONTAINS for each segment, the LIST also has about 50000 elements. so the time complexity will be O(50000 * 50000 ) What changes were proposed in this PR? Change List.CONTAINS to MAP.containsKEY Does this PR introduce any user interface change? No Is any new testcase added? No This closes #3880 --- .../org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java index e61f742..2dd52a4 100644 --- a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java +++ b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java @@ -389,14 +389,16 @@ public class CarbonTableInputFormat<T> extends CarbonInputFormat<T> { public void updateLoadMetaDataDetailsToSegments(List<Segment> validSegments, List<org.apache.carbondata.hadoop.CarbonInputSplit> prunedSplits) { + Map<String, Segment> validSegmentsMap = validSegments.stream() + .collect(Collectors.toMap(Segment::getSegmentNo, segment -> segment, (e1, e2) -> e1)); for (CarbonInputSplit split : prunedSplits) { Segment segment = split.getSegment(); if (segment.getLoadMetadataDetails() == null || segment.getReadCommittedScope() == null) { - if (validSegments.contains(segment)) { + if (validSegmentsMap.containsKey(segment.getSegmentNo())) { segment.setLoadMetadataDetails( - validSegments.get(validSegments.indexOf(segment)).getLoadMetadataDetails()); + validSegmentsMap.get(segment.getSegmentNo()).getLoadMetadataDetails()); segment.setReadCommittedScope( - validSegments.get(validSegments.indexOf(segment)).getReadCommittedScope()); + validSegmentsMap.get(segment.getSegmentNo()).getReadCommittedScope()); } } }