[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r510123816



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1138,73 +1126,36 @@ private static Boolean 
checkUpdateDeltaFilesInSeg(Segment seg,
   }
 
   /**
-   * Check is the segment passed qualifies for IUD delete delta compaction or 
not i.e.
-   * if the number of delete delta files present in the segment is more than
-   * numberDeltaFilesThreshold.
+   * Check whether the segment passed qualifies for IUD delete delta 
compaction or not,
+   * i.e., if the number of delete delta files present in the segment is more 
than
+   * numberDeltaFilesThreshold, this segment will be selected.
*
-   * @param seg
-   * @param segmentUpdateStatusManager
-   * @param numberDeltaFilesThreshold
-   * @return
+   * @param seg segment to be qualified
+   * @param segmentUpdateStatusManager segments & blocks details management
+   * @param numberDeltaFilesThreshold threshold of delete delta files
+   * @return block list of the segment
*/
-  private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
+  private static List checkDeleteDeltaFilesInSeg(Segment seg,
   SegmentUpdateStatusManager segmentUpdateStatusManager, int 
numberDeltaFilesThreshold) {
 
+List blockLists = new ArrayList<>();
 Set uniqueBlocks = new HashSet();
 List blockNameList =
 segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
-
-for (final String blockName : blockNameList) {
-
-  CarbonFile[] deleteDeltaFiles =
+for (String blockName : blockNameList) {
+  List deleteDeltaFiles =
   segmentUpdateStatusManager.getDeleteDeltaFilesList(seg, blockName);
-  if (null != deleteDeltaFiles) {
-// The Delete Delta files may have Spill over blocks. Will consider 
multiple spill over
-// blocks as one. Currently DeleteDeltaFiles array contains Delete 
Delta Block name which
-// lies within Delete Delta Start TimeStamp and End TimeStamp. In 
order to eliminate
-// Spill Over Blocks will choose files with unique taskID.
-for (CarbonFile blocks : deleteDeltaFiles) {
-  // Get Task ID and the Timestamp from the Block name for e.g.
-  // part-0-3-1481084721319.carbondata => "3-1481084721319"
-  String task = 
CarbonTablePath.DataFileUtil.getTaskNo(blocks.getName());
-  String timestamp =
-  
CarbonTablePath.DataFileUtil.getTimeStampFromDeleteDeltaFile(blocks.getName());
-  String taskAndTimeStamp = task + "-" + timestamp;
+  if (null != deleteDeltaFiles && deleteDeltaFiles.size() > 
numberDeltaFilesThreshold) {
+for (String file : deleteDeltaFiles) {

Review comment:
   i gave comment to change the variable name and remove spaces after and 
before :, please check





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r510018289



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1246,20 +1197,16 @@ public static boolean isHorizontalCompactionEnabled() {
 // set the update status.
 segmentUpdateStatusManager.setUpdateStatusDetails(segmentUpdateDetails);
 
-CarbonFile[] deleteDeltaFiles =
+List deleteFilePathList =
 segmentUpdateStatusManager.getDeleteDeltaFilesList(new Segment(seg), 
blockName);
 
 String destFileName =
 blockName + "-" + timestamp.toString() + 
CarbonCommonConstants.DELETE_DELTA_FILE_EXT;
-List deleteFilePathList = new ArrayList<>();
-if (null != deleteDeltaFiles && deleteDeltaFiles.length > 0 && null != 
deleteDeltaFiles[0]
-.getParentFile()) {
-  String fullBlockFilePath = 
deleteDeltaFiles[0].getParentFile().getCanonicalPath()
-  + CarbonCommonConstants.FILE_SEPARATOR + destFileName;
-
-  for (CarbonFile cFile : deleteDeltaFiles) {
-deleteFilePathList.add(cFile.getCanonicalPath());
-  }
+if (deleteFilePathList.size() > 0) {
+  String deleteDeltaFilePath = deleteFilePathList.get(0);
+  String fullBlockFilePath =
+  deleteDeltaFilePath.substring(0, 
deleteDeltaFilePath.lastIndexOf("/")) +

Review comment:
   use constant for "/"





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r510001821



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1138,73 +1126,36 @@ private static Boolean 
checkUpdateDeltaFilesInSeg(Segment seg,
   }
 
   /**
-   * Check is the segment passed qualifies for IUD delete delta compaction or 
not i.e.
-   * if the number of delete delta files present in the segment is more than
-   * numberDeltaFilesThreshold.
+   * Check whether the segment passed qualifies for IUD delete delta 
compaction or not,
+   * i.e., if the number of delete delta files present in the segment is more 
than
+   * numberDeltaFilesThreshold, this segment will be selected.
*
-   * @param seg
-   * @param segmentUpdateStatusManager
-   * @param numberDeltaFilesThreshold
-   * @return
+   * @param seg segment to be qualified
+   * @param segmentUpdateStatusManager segments & blocks details management
+   * @param numberDeltaFilesThreshold threshold of delete delta files
+   * @return block list of the segment
*/
-  private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
+  private static List checkDeleteDeltaFilesInSeg(Segment seg,
   SegmentUpdateStatusManager segmentUpdateStatusManager, int 
numberDeltaFilesThreshold) {
 
+List blockLists = new ArrayList<>();
 Set uniqueBlocks = new HashSet();
 List blockNameList =
 segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
-
-for (final String blockName : blockNameList) {
-
-  CarbonFile[] deleteDeltaFiles =
+for (String blockName : blockNameList) {
+  List deleteDeltaFiles =
   segmentUpdateStatusManager.getDeleteDeltaFilesList(seg, blockName);
-  if (null != deleteDeltaFiles) {
-// The Delete Delta files may have Spill over blocks. Will consider 
multiple spill over
-// blocks as one. Currently DeleteDeltaFiles array contains Delete 
Delta Block name which
-// lies within Delete Delta Start TimeStamp and End TimeStamp. In 
order to eliminate
-// Spill Over Blocks will choose files with unique taskID.
-for (CarbonFile blocks : deleteDeltaFiles) {
-  // Get Task ID and the Timestamp from the Block name for e.g.
-  // part-0-3-1481084721319.carbondata => "3-1481084721319"
-  String task = 
CarbonTablePath.DataFileUtil.getTaskNo(blocks.getName());
-  String timestamp =
-  
CarbonTablePath.DataFileUtil.getTimeStampFromDeleteDeltaFile(blocks.getName());
-  String taskAndTimeStamp = task + "-" + timestamp;
+  if (null != deleteDeltaFiles && deleteDeltaFiles.size() > 
numberDeltaFilesThreshold) {
+for (String file : deleteDeltaFiles) {

Review comment:
   ```suggestion
   for (String deleteDeltaFile: deleteDeltaFiles) {
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r510002204



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1138,73 +1126,36 @@ private static Boolean 
checkUpdateDeltaFilesInSeg(Segment seg,
   }
 
   /**
-   * Check is the segment passed qualifies for IUD delete delta compaction or 
not i.e.
-   * if the number of delete delta files present in the segment is more than
-   * numberDeltaFilesThreshold.
+   * Check whether the segment passed qualifies for IUD delete delta 
compaction or not,
+   * i.e., if the number of delete delta files present in the segment is more 
than
+   * numberDeltaFilesThreshold, this segment will be selected.
*
-   * @param seg
-   * @param segmentUpdateStatusManager
-   * @param numberDeltaFilesThreshold
-   * @return
+   * @param seg segment to be qualified
+   * @param segmentUpdateStatusManager segments & blocks details management
+   * @param numberDeltaFilesThreshold threshold of delete delta files
+   * @return block list of the segment
*/
-  private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
+  private static List checkDeleteDeltaFilesInSeg(Segment seg,
   SegmentUpdateStatusManager segmentUpdateStatusManager, int 
numberDeltaFilesThreshold) {
 
+List blockLists = new ArrayList<>();
 Set uniqueBlocks = new HashSet();
 List blockNameList =
 segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
-
-for (final String blockName : blockNameList) {
-
-  CarbonFile[] deleteDeltaFiles =
+for (String blockName : blockNameList) {
+  List deleteDeltaFiles =
   segmentUpdateStatusManager.getDeleteDeltaFilesList(seg, blockName);
-  if (null != deleteDeltaFiles) {
-// The Delete Delta files may have Spill over blocks. Will consider 
multiple spill over
-// blocks as one. Currently DeleteDeltaFiles array contains Delete 
Delta Block name which
-// lies within Delete Delta Start TimeStamp and End TimeStamp. In 
order to eliminate
-// Spill Over Blocks will choose files with unique taskID.
-for (CarbonFile blocks : deleteDeltaFiles) {
-  // Get Task ID and the Timestamp from the Block name for e.g.
-  // part-0-3-1481084721319.carbondata => "3-1481084721319"
-  String task = 
CarbonTablePath.DataFileUtil.getTaskNo(blocks.getName());
-  String timestamp =
-  
CarbonTablePath.DataFileUtil.getTimeStampFromDeleteDeltaFile(blocks.getName());
-  String taskAndTimeStamp = task + "-" + timestamp;
+  if (null != deleteDeltaFiles && deleteDeltaFiles.size() > 
numberDeltaFilesThreshold) {
+for (String file : deleteDeltaFiles) {

Review comment:
   please do the proper formatting





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r510001358



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1138,73 +1126,36 @@ private static Boolean 
checkUpdateDeltaFilesInSeg(Segment seg,
   }
 
   /**
-   * Check is the segment passed qualifies for IUD delete delta compaction or 
not i.e.
-   * if the number of delete delta files present in the segment is more than
-   * numberDeltaFilesThreshold.
+   * Check whether the segment passed qualifies for IUD delete delta 
compaction or not,
+   * i.e., if the number of delete delta files present in the segment is more 
than
+   * numberDeltaFilesThreshold, this segment will be selected.
*
-   * @param seg
-   * @param segmentUpdateStatusManager
-   * @param numberDeltaFilesThreshold
-   * @return
+   * @param seg segment to be qualified
+   * @param segmentUpdateStatusManager segments & blocks details management
+   * @param numberDeltaFilesThreshold threshold of delete delta files
+   * @return block list of the segment
*/
-  private static boolean checkDeleteDeltaFilesInSeg(Segment seg,
+  private static List checkDeleteDeltaFilesInSeg(Segment seg,
   SegmentUpdateStatusManager segmentUpdateStatusManager, int 
numberDeltaFilesThreshold) {
 
+List blockLists = new ArrayList<>();
 Set uniqueBlocks = new HashSet();
 List blockNameList =
 segmentUpdateStatusManager.getBlockNameFromSegment(seg.getSegmentNo());
-
-for (final String blockName : blockNameList) {
-
-  CarbonFile[] deleteDeltaFiles =
+for (String blockName : blockNameList) {
+  List deleteDeltaFiles =
   segmentUpdateStatusManager.getDeleteDeltaFilesList(seg, blockName);
-  if (null != deleteDeltaFiles) {
-// The Delete Delta files may have Spill over blocks. Will consider 
multiple spill over

Review comment:
   please do not delete the existing comments in code, add it back





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509979972



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##
@@ -130,6 +130,7 @@ object HorizontalCompaction {
   absTableIdentifier,
   segmentUpdateStatusManager,
   compactionTypeIUD)
+LOG.debug(s"The segment list for Horizontal Update Compaction is ${ 
validSegList }")

Review comment:
   remove brackets, which are unnecessary





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509980034



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##
@@ -177,6 +178,7 @@ object HorizontalCompaction {
   absTableIdentifier,
   segmentUpdateStatusManager,
   compactionTypeIUD)
+LOG.debug(s"The segment list for Horizontal Update Compaction is ${ 
deletedBlocksList }")

Review comment:
   remove brackets, which are unnecessary





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509979593



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##
@@ -415,44 +415,39 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each 
SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
*/
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final 
String blockName) {
+  public List getDeleteDeltaFilesList(final Segment seg, final String 
blockName) {
+
+List deleteDeltaFileList = new ArrayList<>();
 String segmentPath = CarbonTablePath.getSegmentPath(
-identifier.getTablePath(), segmentId.getSegmentNo());
-CarbonFile segDir =
-FileFactory.getCarbonFile(segmentPath);
+identifier.getTablePath(), seg.getSegmentNo());
+
 for (SegmentUpdateDetails block : updateDetails) {
   if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-  (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-  && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus( {
-final long deltaStartTimestamp =
-
getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-final long deltaEndTimeStamp =
-getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, 
block);
-
-return segDir.listFiles(new CarbonFileFilter() {
-
-  @Override
-  public boolean accept(CarbonFile pathName) {
-String fileName = pathName.getName();
-if (pathName.getSize() > 0
-&& 
fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-  String blkName = fileName.substring(0, 
fileName.lastIndexOf("-"));
-  long timestamp =
-  
Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-  return blockName.equals(blkName) && timestamp <= 
deltaEndTimeStamp
-  && timestamp >= deltaStartTimestamp;
-}
-return false;
-  }
-});
+  (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+  !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
+Set deltaFileTimestamps = block.getDeltaFileStamps();
+String deleteDeltaFilePrefix = segmentPath + 
CarbonCommonConstants.FILE_SEPARATOR +
+blockName + CarbonCommonConstants.HYPHEN;
+if (deltaFileTimestamps != null && deltaFileTimestamps.size() > 0) {
+  deltaFileTimestamps.forEach(timestamp -> deleteDeltaFileList.add(
+  deleteDeltaFilePrefix + timestamp + 
CarbonCommonConstants.DELETE_DELTA_FILE_EXT));
+} else {
+  final long deltaEndTimeStamp =

Review comment:
   yes, i mean to say, add a comment in code so that whoever works on it 
next, it will be easy to understand from comments. Always add comments wherever 
you feel important





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509920810



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##
@@ -415,44 +415,39 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each 
SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
*/
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final 
String blockName) {
+  public List getDeleteDeltaFilesList(final Segment seg, final String 
blockName) {
+
+List deleteDeltaFileList = new ArrayList<>();
 String segmentPath = CarbonTablePath.getSegmentPath(
-identifier.getTablePath(), segmentId.getSegmentNo());
-CarbonFile segDir =
-FileFactory.getCarbonFile(segmentPath);
+identifier.getTablePath(), seg.getSegmentNo());
+
 for (SegmentUpdateDetails block : updateDetails) {
   if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-  (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-  && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus( {
-final long deltaStartTimestamp =
-
getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
-final long deltaEndTimeStamp =
-getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, 
block);
-
-return segDir.listFiles(new CarbonFileFilter() {
-
-  @Override
-  public boolean accept(CarbonFile pathName) {
-String fileName = pathName.getName();
-if (pathName.getSize() > 0
-&& 
fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-  String blkName = fileName.substring(0, 
fileName.lastIndexOf("-"));
-  long timestamp =
-  
Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-  return blockName.equals(blkName) && timestamp <= 
deltaEndTimeStamp
-  && timestamp >= deltaStartTimestamp;
-}
-return false;
-  }
-});
+  (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+  !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
+Set deltaFileTimestamps = block.getDeltaFileStamps();
+String deleteDeltaFilePrefix = segmentPath + 
CarbonCommonConstants.FILE_SEPARATOR +
+blockName + CarbonCommonConstants.HYPHEN;
+if (deltaFileTimestamps != null && deltaFileTimestamps.size() > 0) {
+  deltaFileTimestamps.forEach(timestamp -> deleteDeltaFileList.add(
+  deleteDeltaFilePrefix + timestamp + 
CarbonCommonConstants.DELETE_DELTA_FILE_EXT));
+} else {
+  final long deltaEndTimeStamp =

Review comment:
   add a comment here saying, when the deltaFileTimestamps  is null, then 
there is only one delta file and update details will have same start and end 
timestamps





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-22 Thread GitBox


akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509920920



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##
@@ -415,44 +415,39 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each 
SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
*/
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final 
String blockName) {
+  public List getDeleteDeltaFilesList(final Segment seg, final String 
blockName) {
+

Review comment:
   remove empty lines





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-21 Thread GitBox


akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509221753



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java
##
@@ -1039,22 +1039,24 @@ private static boolean 
isSegmentValid(LoadMetadataDetails seg) {
 if (CompactionType.IUD_DELETE_DELTA == compactionTypeIUD) {
   int numberDeleteDeltaFilesThreshold =
   
CarbonProperties.getInstance().getNoDeleteDeltaFilesThresholdForIUDCompaction();
-  List deleteSegments = new ArrayList<>();
-  for (Segment seg : segments) {
-if (checkDeleteDeltaFilesInSeg(seg, segmentUpdateStatusManager,
-numberDeleteDeltaFilesThreshold)) {
-  deleteSegments.add(seg);
+
+  // firstly find the valid segments which are updated from 
SegmentUpdateDetails,
+  // in order to reduce the segment list for behind traversal
+  List segmentsPresentInSegmentUpdateDetails = new ArrayList<>();

Review comment:
   you dont need to delete the old code, its already going inside logic if 
the `updateDetails `are present, So you can just put the 
`getDeleteDeltaFilesInSeg` method inside `checkDeleteDeltaFilesInSeg` and no 
need to completely modify `getSegListIUDCompactionQualified`, it will be more 
clean.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-21 Thread GitBox


akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509204221



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##
@@ -415,44 +415,38 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each 
SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
*/
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final 
String blockName) {
+  public List getDeleteDeltaFilesList(final Segment seg, final String 
blockName) {
+
+List deleteDeltaFileList = new ArrayList<>();
 String segmentPath = CarbonTablePath.getSegmentPath(
-identifier.getTablePath(), segmentId.getSegmentNo());
-CarbonFile segDir =
-FileFactory.getCarbonFile(segmentPath);
+identifier.getTablePath(), seg.getSegmentNo());
+
 for (SegmentUpdateDetails block : updateDetails) {
   if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-  (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-  && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus( {
+  (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+  !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
 final long deltaStartTimestamp =
 
getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
 final long deltaEndTimeStamp =
 getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, 
block);
-
-return segDir.listFiles(new CarbonFileFilter() {
-
-  @Override
-  public boolean accept(CarbonFile pathName) {
-String fileName = pathName.getName();
-if (pathName.getSize() > 0
-&& 
fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-  String blkName = fileName.substring(0, 
fileName.lastIndexOf("-"));
-  long timestamp =
-  
Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-  return blockName.equals(blkName) && timestamp <= 
deltaEndTimeStamp
-  && timestamp >= deltaStartTimestamp;
-}
-return false;
-  }
-});
+Set deleteDeltaFiles = new HashSet<>();

Review comment:
   instead of this you can follow like below
   1. from `SegmentUpdateDetails`, call `getDeltaFileStamps` method and check 
if not null and size > 0, if not null, directly take the list of delta 
timestamps and prepare to final list with the blockname, as we already know
   2. If the `getDeltaFileStamps` gives null, then there is only one valid 
delta timestamp, in this case, `SegmentUpdateDetails` will have same delta 
start and end timestamp, so you can take one and form a delta file name and 
return only this, as only one file is there. No need to create list just to add 
one, so u can create list only in 1st case.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-21 Thread GitBox


akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509202313



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##
@@ -415,44 +415,38 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files of the block of specified segment.
+   * Actually, delete delta file name is generated from each 
SegmentUpdateDetails.
+   *
+   * @param seg the segment which is to find block and its delete delta files
+   * @param blockName the specified block of the segment
+   * @return delete delta file list of the block
*/
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final 
String blockName) {
+  public List getDeleteDeltaFilesList(final Segment seg, final String 
blockName) {
+
+List deleteDeltaFileList = new ArrayList<>();
 String segmentPath = CarbonTablePath.getSegmentPath(
-identifier.getTablePath(), segmentId.getSegmentNo());
-CarbonFile segDir =
-FileFactory.getCarbonFile(segmentPath);
+identifier.getTablePath(), seg.getSegmentNo());
+
 for (SegmentUpdateDetails block : updateDetails) {
   if ((block.getBlockName().equalsIgnoreCase(blockName)) &&
-  (block.getSegmentName().equalsIgnoreCase(segmentId.getSegmentNo()))
-  && !CarbonUpdateUtil.isBlockInvalid((block.getSegmentStatus( {
+  (block.getSegmentName().equalsIgnoreCase(seg.getSegmentNo())) &&
+  !CarbonUpdateUtil.isBlockInvalid(block.getSegmentStatus())) {
 final long deltaStartTimestamp =
 
getStartTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block);
 final long deltaEndTimeStamp =
 getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, 
block);
-
-return segDir.listFiles(new CarbonFileFilter() {
-
-  @Override
-  public boolean accept(CarbonFile pathName) {
-String fileName = pathName.getName();
-if (pathName.getSize() > 0
-&& 
fileName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT)) {
-  String blkName = fileName.substring(0, 
fileName.lastIndexOf("-"));
-  long timestamp =
-  
Long.parseLong(CarbonTablePath.DataFileUtil.getTimeStampFromFileName(fileName));
-  return blockName.equals(blkName) && timestamp <= 
deltaEndTimeStamp
-  && timestamp >= deltaStartTimestamp;
-}
-return false;
-  }
-});
+Set deleteDeltaFiles = new HashSet<>();
+deleteDeltaFiles.add(segmentPath + 
CarbonCommonConstants.FILE_SEPARATOR +
+blockName + CarbonCommonConstants.HYPHEN + deltaStartTimestamp +
+CarbonCommonConstants.DELETE_DELTA_FILE_EXT);
+deleteDeltaFiles.add(segmentPath + 
CarbonCommonConstants.FILE_SEPARATOR +
+blockName + CarbonCommonConstants.HYPHEN + deltaEndTimeStamp +
+CarbonCommonConstants.DELETE_DELTA_FILE_EXT);

Review comment:
   why are you adding two times here with start and end timestamp? This is 
wrong, please check





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-21 Thread GitBox


akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509190649



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##
@@ -125,12 +125,18 @@ object HorizontalCompaction {
   segLists: util.List[Segment]): Unit = {
 val db = carbonTable.getDatabaseName
 val table = carbonTable.getTableName
+val startTime = System.nanoTime()
+
 // get the valid segments qualified for update compaction.
 val validSegList = 
CarbonDataMergerUtil.getSegListIUDCompactionQualified(segLists,
   absTableIdentifier,
   segmentUpdateStatusManager,
   compactionTypeIUD)
 
+val endTime = System.nanoTime()
+LOG.info(s"time taken to get segment list for Horizontal Update Compaction 
is" +

Review comment:
   I dont think time in this log will help us, just segment list can help 
in someway, not time





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-21 Thread GitBox


akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r509190746



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##
@@ -173,11 +179,17 @@ object HorizontalCompaction {
 
 val db = carbonTable.getDatabaseName
 val table = carbonTable.getTableName
+val startTime = System.nanoTime()
+
 val deletedBlocksList = 
CarbonDataMergerUtil.getSegListIUDCompactionQualified(segLists,
   absTableIdentifier,
   segmentUpdateStatusManager,
   compactionTypeIUD)
 
+val endTime = System.nanoTime()
+LOG.info(s"time taken to get deleted block list for Horizontal Delete 
Compaction is" +

Review comment:
   same as above





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3986: [CARBONDATA-4034] Improve the time-consuming of Horizontal Compaction for update

2020-10-19 Thread GitBox


akashrn5 commented on a change in pull request #3986:
URL: https://github.com/apache/carbondata/pull/3986#discussion_r507834166



##
File path: 
core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
##
@@ -415,44 +415,66 @@ public boolean accept(CarbonFile pathName) {
   }
 
   /**
-   * Return all delta file for a block.
-   * @param segmentId
-   * @param blockName
-   * @return
+   * Get all delete delta files mapped to each block of the specified segment.
+   * First list all deletedelta files in the segment dir, then loop the files 
and find
+   * a map of blocks and .deletedelta files related to each block.
+   *
+   * @param seg the segment which is to find blocks
+   * @return a map of block and its file list
*/
-  public CarbonFile[] getDeleteDeltaFilesList(final Segment segmentId, final 
String blockName) {
-String segmentPath = CarbonTablePath.getSegmentPath(
-identifier.getTablePath(), segmentId.getSegmentNo());
-CarbonFile segDir =
-FileFactory.getCarbonFile(segmentPath);
+  public Map> getDeleteDeltaFilesList(final Segment 
seg) {
+
+Map blockDeltaStartAndEndTimestampMap = new HashMap<>();

Review comment:
   @shenjiayu17 why exactly do we need to change here? Introduce map and 
all, still we are doing the list files which is costly operation. I have some 
points, please check
   
   1. Here no need to create these map, again listing files to fill map.
   we are already getting the blockname and every blovck will have one 
corresponding deletedelta file only right. So always the delta files per block 
block will be one. 
   2. The updatedetails contains the blockname, actual blockname, timestamps 
and delete delta timestamps so you can check if the timestamp not empty, you 
can yourself form the delta file name based on these info and return from this 
method.
   
   With the above approach, you will avoid list files operation, filtering 
operation based on timestamp and creating all these maps. So you can avoid all 
these changes.
   
   Always we keep the horizontal compaction threshold as 1, and we dont change 
and dont recommend users to change to get the better performance.

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##
@@ -173,6 +176,9 @@ object HorizontalCompaction {
 
 val db = carbonTable.getDatabaseName
 val table = carbonTable.getTableName
+
+LOG.info(s"Horizontal Delete Compaction operation is getting valid 
segments for [$db.$table].")

Review comment:
   same as above

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala
##
@@ -125,6 +125,9 @@ object HorizontalCompaction {
   segLists: util.List[Segment]): Unit = {
 val db = carbonTable.getDatabaseName
 val table = carbonTable.getTableName
+
+LOG.info(s"Horizontal Update Compaction operation is getting valid 
segments for [$db.$table].")

Review comment:
   i think this log does not give any useful info here, if you put some log 
after line 133 and print `validSegList ` it looks little useful





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org