[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-28 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r513249713



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1138,73 +1126,43 @@ private static Boolean 
checkUpdateDeltaFilesInSeg(Segment seg,
   }
 
   /**
-   * Check is the segment passed qualifies for IUD delete delta compaction or 
not i.e.
-   * if the number of delete delta files present in the segment is more than
-   * numberDeltaFilesThreshold.
+   * Check whether the segment passed qualifies for IUD delete delta 
compaction or not,
+   * i.e., if the number of delete delta files present in the segment is more 
than
+   * numberDeltaFilesThreshold, this segment will be selected.
*
-   * @param seg
-   * @param segmentUpdateStatusManager
-   * @param numberDeltaFilesThreshold
-   * @return
+   * @param seg segment to be qualified
+   * @param segmentUpdateStatusManager segments & blocks details management
+   * @param numberDeltaFilesThreshold threshold of delete delta files
+   * @return block list of the segment
*/
-  private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
+  private static List checkDeleteDeltaFilesInSeg(Segment seg,
   SegmentUpdateStatusManager segmentUpdateStatusManager, int 
numberDeltaFilesThreshold) {
 
+List blockLists = new ArrayList<>();
 Set uniqueBlocks = new HashSet();
 List blockNameList =
 segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
-
-for (final String blockName : blockNameList) {
-
-  CarbonFile[] deleteDeltaFiles =
+for (String blockName : blockNameList) {
+  List deleteDeltaFiles =
   segmentUpdateStatusManager.getDeleteDeltaFilesList(seg, blockName);
-  if (null != deleteDeltaFiles) {
+  if (null != deleteDeltaFiles && deleteDeltaFiles.size() > 
numberDeltaFilesThreshold) {

Review comment:
   Added test case "test IUD Horizontal Compaction when 
CarbonCommonConstants.DELETE_DELTAFILE_COUNT_THRESHOLD_IUD_COMPACTION >= 3"
   in HorizontalCompactionTestCase





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-28 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r513249194



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##
@@ -130,6 +130,7 @@ object HorizontalCompaction {
   absTableIdentifier,
   segmentUpdateStatusManager,
   compactionTypeIUD)
+LOG.debug(s"The segment list for Horizontal Update Compaction is 
$validSegList")

Review comment:
   Done

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##
@@ -177,6 +178,7 @@ object HorizontalCompaction {
   absTableIdentifier,
   segmentUpdateStatusManager,
   compactionTypeIUD)
+LOG.debug(s"The segment list for Horizontal Update Compaction is 
$deletedBlocksList")

Review comment:
   Done

##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##
@@ -415,44 +415,41 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each 
SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
*/
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final 
String blockName) {
+  public List getDeleteDeltaFilesList(final Segment seg, final String 
blockName) {

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-28 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r513248715



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##
@@ -415,44 +415,41 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each 
SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
*/
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final 
String blockName) {
+  public List getDeleteDeltaFilesList(final Segment seg, final String 
blockName) {
+List deleteDeltaFileList = new ArrayList<>();
 String segmentPath = CarbonTablePath.getSegmentPath(
-identifier.getTablePath(), segmentId.getSegmentNo());
-CarbonFile segDir =
-FileFactory.getCarbonFile(segmentPath);
+identifier.getTablePath(), seg.getSegmentNo());
+
 for (SegmentUpdateDetails block : updateDetails) {
   if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-  (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-  && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus( {
-final long deltaStartTimestamp =
-
getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-final long deltaEndTimeStamp =
-getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, 
block);
-
-return segDir.listFiles(new CarbonFileFilter() {
-
-  @Override
-  public boolean accept(CarbonFile pathName) {
-String fileName = pathName.getName();
-if (pathName.getSize() > 0
-&& 
fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-  String blkName = fileName.substring(0, 
fileName.lastIndexOf("-"));
-  long timestamp =
-  
Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-  return blockName.equals(blkName) && timestamp <= 
deltaEndTimeStamp
-  && timestamp >= deltaStartTimestamp;
-}
-return false;
-  }
-});
+  (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+  !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
+Set deltaFileTimestamps = block.getDeltaFileStamps();
+String deleteDeltaFilePrefix = segmentPath + 
CarbonCommonConstants.FILE_SEPARATOR +
+blockName + CarbonCommonConstants.HYPHEN;
+if (deltaFileTimestamps != null && deltaFileTimestamps.size() > 0) {
+  deltaFileTimestamps.forEach(timestamp -> deleteDeltaFileList.add(
+  deleteDeltaFilePrefix + timestamp + 
CarbonCommonConstants.DELETE_DELTA_FILE_EXT));

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-28 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r513248146



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##
@@ -415,44 +415,41 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each 
SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
*/
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final 
String blockName) {
+  public List getDeleteDeltaFilesList(final Segment seg, final String 
blockName) {
+List deleteDeltaFileList = new ArrayList<>();
 String segmentPath = CarbonTablePath.getSegmentPath(
-identifier.getTablePath(), segmentId.getSegmentNo());
-CarbonFile segDir =
-FileFactory.getCarbonFile(segmentPath);
+identifier.getTablePath(), seg.getSegmentNo());
+
 for (SegmentUpdateDetails block : updateDetails) {
   if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-  (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-  && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus( {
-final long deltaStartTimestamp =
-
getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-final long deltaEndTimeStamp =
-getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, 
block);
-
-return segDir.listFiles(new CarbonFileFilter() {
-
-  @Override
-  public boolean accept(CarbonFile pathName) {
-String fileName = pathName.getName();
-if (pathName.getSize() > 0
-&& 
fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-  String blkName = fileName.substring(0, 
fileName.lastIndexOf("-"));
-  long timestamp =
-  
Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-  return blockName.equals(blkName) && timestamp <= 
deltaEndTimeStamp
-  && timestamp >= deltaStartTimestamp;
-}
-return false;
-  }
-});
+  (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+  !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
+Set deltaFileTimestamps = block.getDeltaFileStamps();
+String deleteDeltaFilePrefix = segmentPath + 
CarbonCommonConstants.FILE_SEPARATOR +
+blockName + CarbonCommonConstants.HYPHEN;
+if (deltaFileTimestamps != null && deltaFileTimestamps.size() > 0) {

Review comment:
   Added this test case "test IUD Horizontal Compaction after updating 
without compaction"
   in HorizontalCompactionTestCase





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-23 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r510677675



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1138,73 +1126,36 @@ private static Boolean 
checkUpdateDeltaFilesInSeg(Segment seg,
   }
 
   /**
-   * Check is the segment passed qualifies for IUD delete delta compaction or 
not i.e.
-   * if the number of delete delta files present in the segment is more than
-   * numberDeltaFilesThreshold.
+   * Check whether the segment passed qualifies for IUD delete delta 
compaction or not,
+   * i.e., if the number of delete delta files present in the segment is more 
than
+   * numberDeltaFilesThreshold, this segment will be selected.
*
-   * @param seg
-   * @param segmentUpdateStatusManager
-   * @param numberDeltaFilesThreshold
-   * @return
+   * @param seg segment to be qualified
+   * @param segmentUpdateStatusManager segments & blocks details management
+   * @param numberDeltaFilesThreshold threshold of delete delta files
+   * @return block list of the segment
*/
-  private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
+  private static List checkDeleteDeltaFilesInSeg(Segment seg,
   SegmentUpdateStatusManager segmentUpdateStatusManager, int 
numberDeltaFilesThreshold) {
 
+List blockLists = new ArrayList<>();
 Set uniqueBlocks = new HashSet();
 List blockNameList =
 segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
-
-for (final String blockName : blockNameList) {
-
-  CarbonFile[] deleteDeltaFiles =
+for (String blockName : blockNameList) {
+  List deleteDeltaFiles =
   segmentUpdateStatusManager.getDeleteDeltaFilesList(seg, blockName);
-  if (null != deleteDeltaFiles) {
-// The Delete Delta files may have Spill over blocks. Will consider 
multiple spill over
-// blocks as one. Currently DeleteDeltaFiles array contains Delete 
Delta Block name which
-// lies within Delete Delta Start TimeStamp and End TimeStamp. In 
order to eliminate
-// Spill Over Blocks will choose files with unique taskID.
-for (CarbonFile blocks : deleteDeltaFiles) {
-  // Get Task ID and the Timestamp from the Block name for e.g.
-  // part-0-3-1481084721319.carbondata => "3-1481084721319"
-  String task = 
CarbonTablePath.DataFileUtil.getTaskNo(blocks.getName());
-  String timestamp =
-  
CarbonTablePath.DataFileUtil.getTimeStampFromDeleteDeltaFile(blocks.getName());
-  String taskAndTimeStamp = task + "-" + timestamp;
+  if (null != deleteDeltaFiles && deleteDeltaFiles.size() > 
numberDeltaFilesThreshold) {
+for (String file : deleteDeltaFiles) {

Review comment:
   formatted and modified the variable name





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r510065227



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1138,73 +1126,36 @@ private static Boolean 
checkUpdateDeltaFilesInSeg(Segment seg,
   }
 
   /**
-   * Check is the segment passed qualifies for IUD delete delta compaction or 
not i.e.
-   * if the number of delete delta files present in the segment is more than
-   * numberDeltaFilesThreshold.
+   * Check whether the segment passed qualifies for IUD delete delta 
compaction or not,
+   * i.e., if the number of delete delta files present in the segment is more 
than
+   * numberDeltaFilesThreshold, this segment will be selected.
*
-   * @param seg
-   * @param segmentUpdateStatusManager
-   * @param numberDeltaFilesThreshold
-   * @return
+   * @param seg segment to be qualified
+   * @param segmentUpdateStatusManager segments & blocks details management
+   * @param numberDeltaFilesThreshold threshold of delete delta files
+   * @return block list of the segment
*/
-  private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
+  private static List checkDeleteDeltaFilesInSeg(Segment seg,
   SegmentUpdateStatusManager segmentUpdateStatusManager, int 
numberDeltaFilesThreshold) {
 
+List blockLists = new ArrayList<>();
 Set uniqueBlocks = new HashSet();
 List blockNameList =
 segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
-
-for (final String blockName : blockNameList) {
-
-  CarbonFile[] deleteDeltaFiles =
+for (String blockName : blockNameList) {
+  List deleteDeltaFiles =
   segmentUpdateStatusManager.getDeleteDeltaFilesList(seg, blockName);
-  if (null != deleteDeltaFiles) {
-// The Delete Delta files may have Spill over blocks. Will consider 
multiple spill over
-// blocks as one. Currently DeleteDeltaFiles array contains Delete 
Delta Block name which
-// lies within Delete Delta Start TimeStamp and End TimeStamp. In 
order to eliminate
-// Spill Over Blocks will choose files with unique taskID.
-for (CarbonFile blocks : deleteDeltaFiles) {
-  // Get Task ID and the Timestamp from the Block name for e.g.
-  // part-0-3-1481084721319.carbondata => "3-1481084721319"
-  String task = 
CarbonTablePath.DataFileUtil.getTaskNo(blocks.getName());
-  String timestamp =
-  
CarbonTablePath.DataFileUtil.getTimeStampFromDeleteDeltaFile(blocks.getName());
-  String taskAndTimeStamp = task + "-" + timestamp;
+  if (null != deleteDeltaFiles && deleteDeltaFiles.size() > 
numberDeltaFilesThreshold) {
+for (String file : deleteDeltaFiles) {

Review comment:
   It was formatted





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r510065323



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1246,20 +1197,16 @@ public static boolean isHorizontalCompactionEnabled() {
 // set the update status.
 segmentUpdateStatusManager.setUpdateStatusDetails(segmentUpdateDetails);
 
-CarbonFile[] deleteDeltaFiles =
+List deleteFilePathList =
 segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg), 
blockName);
 
 String destFileName =
 blockName + "-" + timestamp.toString() + 
CarbonCommonConstants.DELETE_DELTA_FILE_EXT;
-List deleteFilePathList = new ArrayList<>();
-if (null != deleteDeltaFiles && deleteDeltaFiles.length > 0 && null != 
deleteDeltaFiles[0]
-.getParentFile()) {
-  String fullBlockFilePath = 
deleteDeltaFiles[0].getParentFile().getCanonicalPath()
-  + CarbonCommonConstants.FILE_SEPARATOR + destFileName;
-
-  for (CarbonFile cFile : deleteDeltaFiles) {
-deleteFilePathList.add(cFile.getCanonicalPath());
-  }
+if (deleteFilePathList.size() > 0) {
+  String deleteDeltaFilePath = deleteFilePathList.get(0);
+  String fullBlockFilePath =
+  deleteDeltaFilePath.substring(0, 
deleteDeltaFilePath.lastIndexOf("/")) +

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r510064286



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1138,73 +1126,36 @@ private static Boolean 
checkUpdateDeltaFilesInSeg(Segment seg,
   }
 
   /**
-   * Check is the segment passed qualifies for IUD delete delta compaction or 
not i.e.
-   * if the number of delete delta files present in the segment is more than
-   * numberDeltaFilesThreshold.
+   * Check whether the segment passed qualifies for IUD delete delta 
compaction or not,
+   * i.e., if the number of delete delta files present in the segment is more 
than
+   * numberDeltaFilesThreshold, this segment will be selected.
*
-   * @param seg
-   * @param segmentUpdateStatusManager
-   * @param numberDeltaFilesThreshold
-   * @return
+   * @param seg segment to be qualified
+   * @param segmentUpdateStatusManager segments & blocks details management
+   * @param numberDeltaFilesThreshold threshold of delete delta files
+   * @return block list of the segment
*/
-  private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
+  private static List checkDeleteDeltaFilesInSeg(Segment seg,
   SegmentUpdateStatusManager segmentUpdateStatusManager, int 
numberDeltaFilesThreshold) {
 
+List blockLists = new ArrayList<>();
 Set uniqueBlocks = new HashSet();
 List blockNameList =
 segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
-
-for (final String blockName : blockNameList) {
-
-  CarbonFile[] deleteDeltaFiles =
+for (String blockName : blockNameList) {
+  List deleteDeltaFiles =
   segmentUpdateStatusManager.getDeleteDeltaFilesList(seg, blockName);
-  if (null != deleteDeltaFiles) {
-// The Delete Delta files may have Spill over blocks. Will consider 
multiple spill over

Review comment:
   Recovered the existing comments. I will pay attention to it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r510063577



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##
@@ -130,6 +130,7 @@ object HorizontalCompaction {
   absTableIdentifier,
   segmentUpdateStatusManager,
   compactionTypeIUD)
+LOG.debug(s"The segment list for Horizontal Update Compaction is ${ 
validSegList }")

Review comment:
   Done

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##
@@ -177,6 +178,7 @@ object HorizontalCompaction {
   absTableIdentifier,
   segmentUpdateStatusManager,
   compactionTypeIUD)
+LOG.debug(s"The segment list for Horizontal Update Compaction is ${ 
deletedBlocksList }")

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r510063450



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##
@@ -415,44 +415,39 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each 
SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
*/
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final 
String blockName) {
+  public List getDeleteDeltaFilesList(final Segment seg, final String 
blockName) {
+
+List deleteDeltaFileList = new ArrayList<>();
 String segmentPath = CarbonTablePath.getSegmentPath(
-identifier.getTablePath(), segmentId.getSegmentNo());
-CarbonFile segDir =
-FileFactory.getCarbonFile(segmentPath);
+identifier.getTablePath(), seg.getSegmentNo());
+
 for (SegmentUpdateDetails block : updateDetails) {
   if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-  (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-  && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus( {
-final long deltaStartTimestamp =
-
getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-final long deltaEndTimeStamp =
-getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, 
block);
-
-return segDir.listFiles(new CarbonFileFilter() {
-
-  @Override
-  public boolean accept(CarbonFile pathName) {
-String fileName = pathName.getName();
-if (pathName.getSize() > 0
-&& 
fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-  String blkName = fileName.substring(0, 
fileName.lastIndexOf("-"));
-  long timestamp =
-  
Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-  return blockName.equals(blkName) && timestamp <= 
deltaEndTimeStamp
-  && timestamp >= deltaStartTimestamp;
-}
-return false;
-  }
-});
+  (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+  !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
+Set deltaFileTimestamps = block.getDeltaFileStamps();
+String deleteDeltaFilePrefix = segmentPath + 
CarbonCommonConstants.FILE_SEPARATOR +
+blockName + CarbonCommonConstants.HYPHEN;
+if (deltaFileTimestamps != null && deltaFileTimestamps.size() > 0) {
+  deltaFileTimestamps.forEach(timestamp -> deleteDeltaFileList.add(
+  deleteDeltaFilePrefix + timestamp + 
CarbonCommonConstants.DELETE_DELTA_FILE_EXT));
+} else {
+  final long deltaEndTimeStamp =

Review comment:
   ok, I understood.  Added a comment here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509956278



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##
@@ -415,44 +415,39 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each 
SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
*/
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final 
String blockName) {
+  public List getDeleteDeltaFilesList(final Segment seg, final String 
blockName) {
+
+List deleteDeltaFileList = new ArrayList<>();
 String segmentPath = CarbonTablePath.getSegmentPath(
-identifier.getTablePath(), segmentId.getSegmentNo());
-CarbonFile segDir =
-FileFactory.getCarbonFile(segmentPath);
+identifier.getTablePath(), seg.getSegmentNo());
+
 for (SegmentUpdateDetails block : updateDetails) {
   if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-  (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-  && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus( {
-final long deltaStartTimestamp =
-
getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-final long deltaEndTimeStamp =
-getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, 
block);
-
-return segDir.listFiles(new CarbonFileFilter() {
-
-  @Override
-  public boolean accept(CarbonFile pathName) {
-String fileName = pathName.getName();
-if (pathName.getSize() > 0
-&& 
fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-  String blkName = fileName.substring(0, 
fileName.lastIndexOf("-"));
-  long timestamp =
-  
Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-  return blockName.equals(blkName) && timestamp <= 
deltaEndTimeStamp
-  && timestamp >= deltaStartTimestamp;
-}
-return false;
-  }
-});
+  (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+  !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
+Set deltaFileTimestamps = block.getDeltaFileStamps();
+String deleteDeltaFilePrefix = segmentPath + 
CarbonCommonConstants.FILE_SEPARATOR +
+blockName + CarbonCommonConstants.HYPHEN;
+if (deltaFileTimestamps != null && deltaFileTimestamps.size() > 0) {
+  deltaFileTimestamps.forEach(timestamp -> deleteDeltaFileList.add(
+  deleteDeltaFilePrefix + timestamp + 
CarbonCommonConstants.DELETE_DELTA_FILE_EXT));
+} else {
+  final long deltaEndTimeStamp =

Review comment:
   The part in else{} is when deltaFileTimestamps is null. 
   The updateDetails has same start and end timestamp, here I use endTimestamp 
to form the delta file





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509948458



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##
@@ -415,44 +415,39 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each 
SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
*/
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final 
String blockName) {
+  public List getDeleteDeltaFilesList(final Segment seg, final String 
blockName) {
+

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509947651



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1039,22 +1039,24 @@ private static boolean 
isSegmentValid(LoadMetadataDetails seg) {
 if (CompactionType.IUD_DELETE_DELTA == compactionTypeIUD) {
   int numberDeleteDeltaFilesThreshold =
   
CarbonProperties.getInstance().getNoDeleteDeltaFilesThresholdForIUDCompaction();
-  List deleteSegments = new ArrayList<>();
-  for (Segment seg : segments) {
-if (checkDeleteDeltaFilesInSeg(seg, segmentUpdateStatusManager,
-numberDeleteDeltaFilesThreshold)) {
-  deleteSegments.add(seg);
+
+  // firstly find the valid segments which are updated from 
SegmentUpdateDetails,
+  // in order to reduce the segment list for behind traversal
+  List segmentsPresentInSegmentUpdateDetails = new ArrayList<>();

Review comment:
   combined the `getDeleteDeltaFilesInSeg`and `checkDeleteDeltaFilesInSeg` 
, and kept the method name `checkDeleteDeltaFilesInSeg` and let it return 
`List`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509947651



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1039,22 +1039,24 @@ private static boolean 
isSegmentValid(LoadMetadataDetails seg) {
 if (CompactionType.IUD_DELETE_DELTA == compactionTypeIUD) {
   int numberDeleteDeltaFilesThreshold =
   
CarbonProperties.getInstance().getNoDeleteDeltaFilesThresholdForIUDCompaction();
-  List deleteSegments = new ArrayList<>();
-  for (Segment seg : segments) {
-if (checkDeleteDeltaFilesInSeg(seg, segmentUpdateStatusManager,
-numberDeleteDeltaFilesThreshold)) {
-  deleteSegments.add(seg);
+
+  // firstly find the valid segments which are updated from 
SegmentUpdateDetails,
+  // in order to reduce the segment list for behind traversal
+  List segmentsPresentInSegmentUpdateDetails = new ArrayList<>();

Review comment:
   combined the `getDeleteDeltaFilesInSeg`and `checkDeleteDeltaFilesInSeg` 
, and kept the method name `checkDeleteDeltaFilesInSeg` and let it return 
List





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509945109



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##
@@ -415,44 +415,38 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each 
SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
*/
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final 
String blockName) {
+  public List getDeleteDeltaFilesList(final Segment seg, final String 
blockName) {
+
+List deleteDeltaFileList = new ArrayList<>();
 String segmentPath = CarbonTablePath.getSegmentPath(
-identifier.getTablePath(), segmentId.getSegmentNo());
-CarbonFile segDir =
-FileFactory.getCarbonFile(segmentPath);
+identifier.getTablePath(), seg.getSegmentNo());
+
 for (SegmentUpdateDetails block : updateDetails) {
   if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-  (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-  && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus( {
+  (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+  !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
 final long deltaStartTimestamp =
 
getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
 final long deltaEndTimeStamp =
 getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, 
block);
-
-return segDir.listFiles(new CarbonFileFilter() {
-
-  @Override
-  public boolean accept(CarbonFile pathName) {
-String fileName = pathName.getName();
-if (pathName.getSize() > 0
-&& 
fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-  String blkName = fileName.substring(0, 
fileName.lastIndexOf("-"));
-  long timestamp =
-  
Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-  return blockName.equals(blkName) && timestamp <= 
deltaEndTimeStamp
-  && timestamp >= deltaStartTimestamp;
-}
-return false;
-  }
-});
+Set deleteDeltaFiles = new HashSet<>();

Review comment:
   modified code like this approach





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509943740



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##
@@ -415,44 +415,38 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each 
SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
*/
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final 
String blockName) {
+  public List getDeleteDeltaFilesList(final Segment seg, final String 
blockName) {
+
+List deleteDeltaFileList = new ArrayList<>();
 String segmentPath = CarbonTablePath.getSegmentPath(
-identifier.getTablePath(), segmentId.getSegmentNo());
-CarbonFile segDir =
-FileFactory.getCarbonFile(segmentPath);
+identifier.getTablePath(), seg.getSegmentNo());
+
 for (SegmentUpdateDetails block : updateDetails) {
   if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-  (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-  && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus( {
+  (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+  !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
 final long deltaStartTimestamp =
 
getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
 final long deltaEndTimeStamp =
 getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, 
block);
-
-return segDir.listFiles(new CarbonFileFilter() {
-
-  @Override
-  public boolean accept(CarbonFile pathName) {
-String fileName = pathName.getName();
-if (pathName.getSize() > 0
-&& 
fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-  String blkName = fileName.substring(0, 
fileName.lastIndexOf("-"));
-  long timestamp =
-  
Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-  return blockName.equals(blkName) && timestamp <= 
deltaEndTimeStamp
-  && timestamp >= deltaStartTimestamp;
-}
-return false;
-  }
-});
+Set deleteDeltaFiles = new HashSet<>();
+deleteDeltaFiles.add(segmentPath + 
CarbonCommonConstants.FILE_SEPARATOR +
+blockName + CarbonCommonConstants.HYPHEN + deltaStartTimestamp +
+CarbonCommonConstants.DELETE_DELTA_FILE_EXT);
+deleteDeltaFiles.add(segmentPath + 
CarbonCommonConstants.FILE_SEPARATOR +
+blockName + CarbonCommonConstants.HYPHEN + deltaEndTimeStamp +
+CarbonCommonConstants.DELETE_DELTA_FILE_EXT);

Review comment:
   I modified this method by calling `getDeltaFileStamps ` as the comment 
below





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509941524



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##
@@ -125,12 +125,18 @@ object HorizontalCompaction {
   segLists: util.List[Segment]): Unit = {
 val db = carbonTable.getDatabaseName
 val table = carbonTable.getTableName
+val startTime = System.nanoTime()
+
 // get the valid segments qualified for update compaction.
 val validSegList = 
CarbonDataMergerUtil.getSegListIUDCompactionQualified(segLists,
   absTableIdentifier,
   segmentUpdateStatusManager,
   compactionTypeIUD)
 
+val endTime = System.nanoTime()
+LOG.info(s"time taken to get segment list for Horizontal Update Compaction 
is" +

Review comment:
   I have modified the log, which only prints the validSegList and is in 
LOG.debug

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##
@@ -173,11 +179,17 @@ object HorizontalCompaction {
 
 val db = carbonTable.getDatabaseName
 val table = carbonTable.getTableName
+val startTime = System.nanoTime()
+
 val deletedBlocksList = 
CarbonDataMergerUtil.getSegListIUDCompactionQualified(segLists,
   absTableIdentifier,
   segmentUpdateStatusManager,
   compactionTypeIUD)
 
+val endTime = System.nanoTime()
+LOG.info(s"time taken to get deleted block list for Horizontal Delete 
Compaction is" +

Review comment:
   I have modified the log, which only prints the validSegList and is in 
LOG.debug





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r508933755



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##
@@ -173,6 +176,9 @@ object HorizontalCompaction {
 
 val db = carbonTable.getDatabaseName
 val table = carbonTable.getTableName
+
+LOG.info(s"Horizontal Delete Compaction operation is getting valid 
segments for [$db.$table].")

Review comment:
   I have modified the log, which only prints the validSegList and is in 
LOG.debug





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r508933445



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##
@@ -125,6 +125,9 @@ object HorizontalCompaction {
   segLists: util.List[Segment]): Unit = {
 val db = carbonTable.getDatabaseName
 val table = carbonTable.getTableName
+
+LOG.info(s"Horizontal Update Compaction operation is getting valid 
segments for [$db.$table].")

Review comment:
   I have modified the log, which only prints the validSegList and is in 
LOG.debug





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-20 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r508943092



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##
@@ -415,44 +415,66 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files mapped to each block of the specified segment.
+   * First list all deletedelta files in the segment dir, then loop the files 
and find
+   * a map of blocks and .deletedelta files related to each block.
+   *
+   * @param seg the segment which is to find blocks
+   * @return a map of block and its file list
*/
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final 
String blockName) {
-String segmentPath = CarbonTablePath.getSegmentPath(
-identifier.getTablePath(), segmentId.getSegmentNo());
-CarbonFile segDir =
-FileFactory.getCarbonFile(segmentPath);
+  public Map> getDeleteDeltaFilesList(final Segment 
seg) {
+
+Map blockDeltaStartAndEndTimestampMap = new HashMap<>();

Review comment:
   I modified this method as above approach. 
   The delete delta file name is generated from updatedetails, without listing 
files





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-20 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r508933755



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##
@@ -173,6 +176,9 @@ object HorizontalCompaction {
 
 val db = carbonTable.getDatabaseName
 val table = carbonTable.getTableName
+
+LOG.info(s"Horizontal Delete Compaction operation is getting valid 
segments for [$db.$table].")

Review comment:
   This log is modified, prints the deletedBlocksList and time taken to get 
it





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-20 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r508933445



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##
@@ -125,6 +125,9 @@ object HorizontalCompaction {
   segLists: util.List[Segment]): Unit = {
 val db = carbonTable.getDatabaseName
 val table = carbonTable.getTableName
+
+LOG.info(s"Horizontal Update Compaction operation is getting valid 
segments for [$db.$table].")

Review comment:
   This log is modified, prints the validSegList and time taken to get it





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-18 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r507380824



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1246,8 +1209,22 @@ public static boolean isHorizontalCompactionEnabled() {
 // set the update status.
 segmentUpdateStatusManager.setUpdateStatusDetails(segmentUpdateDetails);
 
-CarbonFile[] deleteDeltaFiles =
-segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg), 
blockName);
+// only when SegmentUpdateDetails contain the specified block
+// will the method getDeleteDeltaFilesList be executed
+List blockNameList = 
segmentUpdateStatusManager.getBlockNameFromSegment(seg);
+Map> blockAndDeleteDeltaFilesMap = new 
HashMap<>();
+CarbonFile[] deleteDeltaFiles = null;
+if (blockNameList.contains(blockName)) {
+  blockAndDeleteDeltaFilesMap =
+  segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg));
+}
+if (blockAndDeleteDeltaFilesMap.containsKey(blockName)) {

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-18 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r507380663



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1246,8 +1209,22 @@ public static boolean isHorizontalCompactionEnabled() {
 // set the update status.
 segmentUpdateStatusManager.setUpdateStatusDetails(segmentUpdateDetails);
 
-CarbonFile[] deleteDeltaFiles =
-segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg), 
blockName);
+// only when SegmentUpdateDetails contain the specified block
+// will the method getDeleteDeltaFilesList be executed
+List blockNameList = 
segmentUpdateStatusManager.getBlockNameFromSegment(seg);
+Map> blockAndDeleteDeltaFilesMap = new 
HashMap<>();
+CarbonFile[] deleteDeltaFiles = null;
+if (blockNameList.contains(blockName)) {

Review comment:
   Done. Combined the two judgement





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-18 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r507380251



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1246,8 +1209,22 @@ public static boolean isHorizontalCompactionEnabled() {
 // set the update status.
 segmentUpdateStatusManager.setUpdateStatusDetails(segmentUpdateDetails);
 
-CarbonFile[] deleteDeltaFiles =
-segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg), 
blockName);
+// only when SegmentUpdateDetails contain the specified block
+// will the method getDeleteDeltaFilesList be executed
+List blockNameList = 
segmentUpdateStatusManager.getBlockNameFromSegment(seg);
+Map> blockAndDeleteDeltaFilesMap = new 
HashMap<>();
+CarbonFile[] deleteDeltaFiles = null;

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-18 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r507380031



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1246,8 +1209,22 @@ public static boolean isHorizontalCompactionEnabled() {
 // set the update status.
 segmentUpdateStatusManager.setUpdateStatusDetails(segmentUpdateDetails);
 
-CarbonFile[] deleteDeltaFiles =
-segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg), 
blockName);
+// only when SegmentUpdateDetails contain the specified block
+// will the method getDeleteDeltaFilesList be executed
+List blockNameList = 
segmentUpdateStatusManager.getBlockNameFromSegment(seg);
+Map> blockAndDeleteDeltaFilesMap = new 
HashMap<>();
+CarbonFile[] deleteDeltaFiles = null;
+if (blockNameList.contains(blockName)) {
+  blockAndDeleteDeltaFilesMap =
+  segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg));
+}
+if (blockAndDeleteDeltaFilesMap.containsKey(blockName)) {
+  List deleteDeltaFileList = 
blockAndDeleteDeltaFilesMap.get(blockName);
+  deleteDeltaFiles = deleteDeltaFileList.toArray(new 
CarbonFile[deleteDeltaFileList.size()]);
+}
+
+// CarbonFile[] deleteDeltaFiles =

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-16 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r506203367



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##
@@ -455,6 +455,51 @@ public boolean accept(CarbonFile pathName) {
 return null;
   }
 
+  public Map> getDeleteDeltaFilesForSegment(final 
Segment seg) {
+String segmentPath = CarbonTablePath.getSegmentPath(
+  identifier.getTablePath(), seg.getSegmentNo());

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-16 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r506202461



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##
@@ -455,6 +455,51 @@ public boolean accept(CarbonFile pathName) {
 return null;
   }
 
+  public Map> getDeleteDeltaFilesForSegment(final 
Segment seg) {

Review comment:
   Done.
   Keep function name getDeleteDeltaFilesList and change the return type to 
Map>





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-16 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r506020436



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1039,22 +1039,10 @@ private static boolean 
isSegmentValid(LoadMetadataDetails seg) {
 if (CompactionType.IUD_DELETE_DELTA == compactionTypeIUD) {
   int numberDeleteDeltaFilesThreshold =
   
CarbonProperties.getInstance().getNoDeleteDeltaFilesThresholdForIUDCompaction();
-  List deleteSegments = new ArrayList<>();
   for (Segment seg : segments) {
-if (checkDeleteDeltaFilesInSeg(seg, segmentUpdateStatusManager,

Review comment:
   Done.
   Combined function checkDeleteDeltaFilesInSeg and function 
getDeleteDeltaFilesInSeg to new function checkAndGetDeleteDeltaFilesInSeg





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-16 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r506020781



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1210,6 +1198,39 @@ private static boolean 
checkDeleteDeltaFilesInSeg(Segment seg,
 return blockLists;
   }
 
+  private static List checkAndGetDeleteDeltaFilesInSeg(Segment seg,
+  SegmentUpdateStatusManager segmentUpdateStatusManager, int 
numberDeltaFilesThreshold) {
+
+List blockLists = new ArrayList<>();
+
+Map> blockAndDeleteDeltaFilesMap =
+  segmentUpdateStatusManager.getDeleteDeltaFilesForSegment(seg);
+
+List blockNameList =
+  segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
+
+Set uniqueBlocks = new HashSet();
+for (final String blockName : blockNameList) {
+
+  List deleteDeltaFiles = 
blockAndDeleteDeltaFilesMap.get(blockName);
+
+  if (null != deleteDeltaFiles) {
+for (CarbonFile blocks : deleteDeltaFiles) {

Review comment:
   Done. 
   Added judgement: 
   if (deleteDeltaFiles.size() <= numberDeltaFilesThreshold) continue





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-15 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r506020781



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1210,6 +1198,39 @@ private static boolean 
checkDeleteDeltaFilesInSeg(Segment seg,
 return blockLists;
   }
 
+  private static List checkAndGetDeleteDeltaFilesInSeg(Segment seg,
+  SegmentUpdateStatusManager segmentUpdateStatusManager, int 
numberDeltaFilesThreshold) {
+
+List blockLists = new ArrayList<>();
+
+Map> blockAndDeleteDeltaFilesMap =
+  segmentUpdateStatusManager.getDeleteDeltaFilesForSegment(seg);
+
+List blockNameList =
+  segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
+
+Set uniqueBlocks = new HashSet();
+for (final String blockName : blockNameList) {
+
+  List deleteDeltaFiles = 
blockAndDeleteDeltaFilesMap.get(blockName);
+
+  if (null != deleteDeltaFiles) {
+for (CarbonFile blocks : deleteDeltaFiles) {

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-15 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r506020858



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1210,6 +1198,39 @@ private static boolean 
checkDeleteDeltaFilesInSeg(Segment seg,
 return blockLists;
   }
 
+  private static List checkAndGetDeleteDeltaFilesInSeg(Segment seg,
+  SegmentUpdateStatusManager segmentUpdateStatusManager, int 
numberDeltaFilesThreshold) {
+
+List blockLists = new ArrayList<>();
+
+Map> blockAndDeleteDeltaFilesMap =
+  segmentUpdateStatusManager.getDeleteDeltaFilesForSegment(seg);
+
+List blockNameList =
+  segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
+
+Set uniqueBlocks = new HashSet();
+for (final String blockName : blockNameList) {
+
+  List deleteDeltaFiles = 
blockAndDeleteDeltaFilesMap.get(blockName);
+
+  if (null != deleteDeltaFiles) {
+for (CarbonFile blocks : deleteDeltaFiles) {

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-15 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r506020950



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##
@@ -455,6 +455,51 @@ public boolean accept(CarbonFile pathName) {
 return null;
   }
 
+  public Map> getDeleteDeltaFilesForSegment(final 
Segment seg) {
+String segmentPath = CarbonTablePath.getSegmentPath(
+  identifier.getTablePath(), seg.getSegmentNo());
+CarbonFile segDir = FileFactory.getCarbonFile(segmentPath);
+CarbonFile[] allDeleteDeltaFilesOfSegment = segDir.listFiles(new 
CarbonFileFilter() {
+  @Override
+  public boolean accept(CarbonFile pathName) {
+String fileName = pathName.getName();
+return (pathName.getSize() > 0) &&

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-15 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r506020436



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1039,22 +1039,10 @@ private static boolean 
isSegmentValid(LoadMetadataDetails seg) {
 if (CompactionType.IUD_DELETE_DELTA == compactionTypeIUD) {
   int numberDeleteDeltaFilesThreshold =
   
CarbonProperties.getInstance().getNoDeleteDeltaFilesThresholdForIUDCompaction();
-  List deleteSegments = new ArrayList<>();
   for (Segment seg : segments) {
-if (checkDeleteDeltaFilesInSeg(seg, segmentUpdateStatusManager,

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-15 Thread GitBox


shenjiayu17 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r506020565



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1210,6 +1198,39 @@ private static boolean 
checkDeleteDeltaFilesInSeg(Segment seg,
 return blockLists;
   }
 
+  private static List checkAndGetDeleteDeltaFilesInSeg(Segment seg,
+  SegmentUpdateStatusManager segmentUpdateStatusManager, int 
numberDeltaFilesThreshold) {
+
+List blockLists = new ArrayList<>();
+
+Map> blockAndDeleteDeltaFilesMap =
+  segmentUpdateStatusManager.getDeleteDeltaFilesForSegment(seg);
+
+List blockNameList =
+  segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
+
+Set uniqueBlocks = new HashSet();
+for (final String blockName : blockNameList) {
+
+  List deleteDeltaFiles = 
blockAndDeleteDeltaFilesMap.get(blockName);
+
+  if (null != deleteDeltaFiles) {
+for (CarbonFile blocks : deleteDeltaFiles) {
+  String task = 
CarbonTablePath.DataFileUtil.getTaskNo(blocks.getName());
+  String timestamp =
+
CarbonTablePath.DataFileUtil.getTimeStampFromDeleteDeltaFile(blocks.getName());
+  String taskAndTimeStamp = task + "-" + timestamp;
+  uniqueBlocks.add(taskAndTimeStamp);

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org