This is an automated email from the ASF dual-hosted git repository.

jackylk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
     new 063d9b2  [CARBONDATA-3894] [IUD]decrease the size of tableupdatestaus 
file by remove the invalid segments not exist in tablestatus
063d9b2 is described below

commit 063d9b2aff86f66f22ce75bc6905affc8a4bd8df
Author: Zhangshunyu <zhangshunyu1...@126.com>
AuthorDate: Thu Jul 9 11:23:39 2020 +0800

    [CARBONDATA-3894] [IUD]decrease the size of tableupdatestaus file by remove 
the invalid segments not exist in tablestatus
    
    Why is this PR needed?
    tableupdatestatus file always keep the segments info even the compacted 
segment is deleted already,this will lead to the file size increase quickly, 
which is bad for performance.
    After this change, the tableupdatestatus file size can descrease from ~MB 
to ~KB
    
    What changes were proposed in this PR?
    Remove the invalid segments
    
    Does this PR introduce any user interface change?
    No
    
    Is any new testcase added?
    No
    
    This closes #3833
---
 .../apache/carbondata/core/mutate/CarbonUpdateUtil.java  | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git 
a/core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java 
b/core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
index e915c66..77ebf3e 100644
--- a/core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
+++ b/core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
@@ -148,7 +148,21 @@ public class CarbonUpdateUtil {
           mergeSegmentUpdate(isCompaction, oldList, newBlockEntry);
         }
 
-        segmentUpdateStatusManager.writeLoadDetailsIntoFile(oldList, 
updateStatusFileIdentifier);
+        List<SegmentUpdateDetails> updateDetailsValidSeg = new ArrayList<>();
+        Set<String> loadDetailsSet = new HashSet<>();
+        for (LoadMetadataDetails details : 
segmentUpdateStatusManager.getLoadMetadataDetails()) {
+          loadDetailsSet.add(details.getLoadName());
+        }
+        for (SegmentUpdateDetails updateDetails : oldList) {
+          if (loadDetailsSet.contains(updateDetails.getSegmentName())) {
+            // we should only keep the update info of segments in table 
status, especially after
+            // compaction and clean files some compacted segments will be 
removed. It can keep
+            // tableupdatestatus file in small size which is good for 
performance.
+            updateDetailsValidSeg.add(updateDetails);
+          }
+        }
+        segmentUpdateStatusManager
+            .writeLoadDetailsIntoFile(updateDetailsValidSeg, 
updateStatusFileIdentifier);
         status = true;
       } else {
         LOGGER.error("Not able to acquire the segment update lock.");

Reply via email to