[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.

2020-07-07 Thread GitBox


VenuReddy2103 commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r451275614



##
File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java
##
@@ -302,6 +318,61 @@ private void commitJobForPartition(JobContext context, 
boolean overwriteSet,
 commitJobFinal(context, loadModel, operationContext, carbonTable, 
uniqueId);
   }
 
+  /**
+   * Method to create and write the segment file, removes the temporary 
directories from all the
+   * respective partition directories. This method is invoked only when {@link
+   * CarbonCommonConstants#CARBON_MERGE_INDEX_IN_SEGMENT} is disabled.
+   * @param context Job context
+   * @param loadModel Load model
+   * @param segmentFileName Segment file name to write
+   * @param partitionPath Serialized list of partition location
+   * @throws IOException
+   */
+  @SuppressWarnings("unchecked")
+  private void writeSegmentWithoutMergeIndex(JobContext context, 
CarbonLoadModel loadModel,
+  String segmentFileName, String partitionPath) throws IOException {
+Map indexFileNameMap = (Map) 
ObjectSerializationUtil
+
.convertStringToObject(context.getConfiguration().get("carbon.index.files.name"));
+List partitionList =
+(List) 
ObjectSerializationUtil.convertStringToObject(partitionPath);
+SegmentFileStore.SegmentFile finalSegmentFile = null;
+boolean isRelativePath;
+String partitionLoc;
+for (String partition : partitionList) {
+  isRelativePath = false;
+  partitionLoc = partition;
+  if (partitionLoc.startsWith(loadModel.getTablePath())) {
+partitionLoc = 
partitionLoc.substring(loadModel.getTablePath().length());
+isRelativePath = true;
+  }
+  SegmentFileStore.SegmentFile segmentFile = new 
SegmentFileStore.SegmentFile();
+  SegmentFileStore.FolderDetails folderDetails = new 
SegmentFileStore.FolderDetails();
+  
folderDetails.setFiles(Collections.singleton(indexFileNameMap.get(partition)));
+  folderDetails.setPartitions(
+  
Collections.singletonList(partitionLoc.substring(partitionLoc.indexOf("/") + 
1)));
+  folderDetails.setRelative(isRelativePath);
+  folderDetails.setStatus(SegmentStatus.SUCCESS.getMessage());
+  segmentFile.getLocationMap().put(partitionLoc, folderDetails);
+  if (finalSegmentFile != null) {
+finalSegmentFile = finalSegmentFile.merge(segmentFile);
+  } else {
+finalSegmentFile = segmentFile;
+  }
+}
+Objects.requireNonNull(finalSegmentFile);
+String segmentFilesLocation =

Review comment:
   Agreed. Moved this code to SegmentFileStore.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.

2020-07-07 Thread GitBox


VenuReddy2103 commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r451275288



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableLoadingTestCase.scala
##
@@ -640,6 +653,10 @@ class StandardPartitionTableLoadingTestCase extends 
QueryTest with BeforeAndAfte
 }
   }
 
+  override def afterEach(): Unit = {
+CarbonProperties.getInstance()

Review comment:
   ok. Removed this afterEach.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.

2020-07-07 Thread GitBox


VenuReddy2103 commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r450765502



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala
##
@@ -253,6 +255,14 @@ case class CarbonSQLHadoopMapReduceCommitProtocol(jobId: 
String, path: String, i
 if (size.isDefined) {
   dataSize = dataSize + java.lang.Long.parseLong(size.get)
 }
+val indexSize = map.get("carbon.indexsize")
+if (indexSize.isDefined) {
+  indexLen = indexLen + java.lang.Long.parseLong(indexSize.get)
+}
+val indexFiles = map.get("carbon.index.files.name")
+if (indexFiles.isDefined) {
+  indexFilesName = indexFiles.get

Review comment:
   It is a serialied map. Like "carbon.output.partitions.name" and 
"carbon.output.files.name", "carbon.index.files.name" is also serialized. We 
deserialize and use





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.

2020-07-07 Thread GitBox


VenuReddy2103 commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r450765502



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala
##
@@ -253,6 +255,14 @@ case class CarbonSQLHadoopMapReduceCommitProtocol(jobId: 
String, path: String, i
 if (size.isDefined) {
   dataSize = dataSize + java.lang.Long.parseLong(size.get)
 }
+val indexSize = map.get("carbon.indexsize")
+if (indexSize.isDefined) {
+  indexLen = indexLen + java.lang.Long.parseLong(indexSize.get)
+}
+val indexFiles = map.get("carbon.index.files.name")
+if (indexFiles.isDefined) {
+  indexFilesName = indexFiles.get

Review comment:
   It is a serialied map. Like "carbon.output.partitions.name" and 
"carbon.output.files.name", "carbon.index.files.name" is also serialized. We 
deserialize and use it before calling SegmentFileStore.writeSegmentFile





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.

2020-07-07 Thread GitBox


VenuReddy2103 commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r450765502



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala
##
@@ -253,6 +255,14 @@ case class CarbonSQLHadoopMapReduceCommitProtocol(jobId: 
String, path: String, i
 if (size.isDefined) {
   dataSize = dataSize + java.lang.Long.parseLong(size.get)
 }
+val indexSize = map.get("carbon.indexsize")
+if (indexSize.isDefined) {
+  indexLen = indexLen + java.lang.Long.parseLong(indexSize.get)
+}
+val indexFiles = map.get("carbon.index.files.name")
+if (indexFiles.isDefined) {
+  indexFilesName = indexFiles.get

Review comment:
   It is a serialied map. Like "carbon.output.partitions.name" and 
"carbon.output.files.name", "carbon.index.files.name" is also serialized. We 
deserialize and use in writeSegmentWithoutMergeIndex().





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.

2020-07-07 Thread GitBox


VenuReddy2103 commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r450765502



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala
##
@@ -253,6 +255,14 @@ case class CarbonSQLHadoopMapReduceCommitProtocol(jobId: 
String, path: String, i
 if (size.isDefined) {
   dataSize = dataSize + java.lang.Long.parseLong(size.get)
 }
+val indexSize = map.get("carbon.indexsize")
+if (indexSize.isDefined) {
+  indexLen = indexLen + java.lang.Long.parseLong(indexSize.get)
+}
+val indexFiles = map.get("carbon.index.files.name")
+if (indexFiles.isDefined) {
+  indexFilesName = indexFiles.get

Review comment:
   It is a serialied.
   
   "carbon.output.partitions.name"





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.

2020-07-07 Thread GitBox


VenuReddy2103 commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r450746967



##
File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java
##
@@ -282,10 +296,12 @@ private void commitJobForPartition(JobContext context, 
boolean overwriteSet,
 throw new IOException(e);
   }
 }
-String segmentFileName = SegmentFileStore.genSegmentFileName(
-loadModel.getSegmentId(), 
String.valueOf(loadModel.getFactTimeStamp()));
 newMetaEntry.setSegmentFile(segmentFileName + CarbonTablePath.SEGMENT_EXT);
-newMetaEntry.setIndexSize("" + loadModel.getMetrics().getMergeIndexSize());
+if (isMergeIndex) {

Review comment:
   loadModel.getMetrics().getMergeIndexSize() is filled in 
MergeIndexEventListene.onEvent() when mergeindex is created. So, can't make it 
else case to line 280.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.

2020-07-07 Thread GitBox


VenuReddy2103 commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r450746967



##
File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java
##
@@ -282,10 +296,12 @@ private void commitJobForPartition(JobContext context, 
boolean overwriteSet,
 throw new IOException(e);
   }
 }
-String segmentFileName = SegmentFileStore.genSegmentFileName(
-loadModel.getSegmentId(), 
String.valueOf(loadModel.getFactTimeStamp()));
 newMetaEntry.setSegmentFile(segmentFileName + CarbonTablePath.SEGMENT_EXT);
-newMetaEntry.setIndexSize("" + loadModel.getMetrics().getMergeIndexSize());
+if (isMergeIndex) {

Review comment:
   loadModel.getMetrics().getMergeIndexSize() filled in 
MergeIndexEventListene.onEvent() when mergeindex is created. So, can't make it 
else case to line 280.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.

2020-06-29 Thread GitBox


VenuReddy2103 commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r447453448



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableLoadingTestCase.scala
##
@@ -640,6 +653,10 @@ class StandardPartitionTableLoadingTestCase extends 
QueryTest with BeforeAndAfte
 }
   }
 
+  override def afterEach(): Unit = {
+CarbonProperties.getInstance()

Review comment:
   Actually it is a good practice to do cleanup/reset in afterEach so that 
even if testcase fails abnormally with some exceptions, it gets cleaned/reset 
in afterEach. And any testcases((if exists) below it will not be affected  with 
previous testcase failures.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.

2020-06-09 Thread GitBox


VenuReddy2103 commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r436633946



##
File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java
##
@@ -302,6 +317,54 @@ private void commitJobForPartition(JobContext context, 
boolean overwriteSet,
 commitJobFinal(context, loadModel, operationContext, carbonTable, 
uniqueId);
   }
 
+  @SuppressWarnings("unchecked")
+  private void writeSegmentWithoutMergeIndex(JobContext context, 
CarbonLoadModel loadModel,
+  String segmentFileName, String partitionPath) throws IOException {
+Map IndexFileNameMap = (Map) 
ObjectSerializationUtil

Review comment:
   Modified

##
File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java
##
@@ -302,6 +317,54 @@ private void commitJobForPartition(JobContext context, 
boolean overwriteSet,
 commitJobFinal(context, loadModel, operationContext, carbonTable, 
uniqueId);
   }
 
+  @SuppressWarnings("unchecked")
+  private void writeSegmentWithoutMergeIndex(JobContext context, 
CarbonLoadModel loadModel,
+  String segmentFileName, String partitionPath) throws IOException {
+Map IndexFileNameMap = (Map) 
ObjectSerializationUtil
+
.convertStringToObject(context.getConfiguration().get("carbon.index.files.name"));
+List partitionList =
+(List) 
ObjectSerializationUtil.convertStringToObject(partitionPath);
+SegmentFileStore.SegmentFile finalSegmentFile = null;
+boolean isRelativePath;
+String path;
+for (String partition : partitionList) {
+  isRelativePath = false;
+  path = partition;
+  if (path.startsWith(loadModel.getTablePath())) {
+path = path.substring(loadModel.getTablePath().length());
+isRelativePath = true;
+  }
+  SegmentFileStore.SegmentFile segmentFile = new 
SegmentFileStore.SegmentFile();
+  SegmentFileStore.FolderDetails folderDetails = new 
SegmentFileStore.FolderDetails();
+  Set set = new HashSet();

Review comment:
   Removed this variable itself now. Directly setting it in folder details.
   
`folderDetails.setFiles(Collections.singleton(indexFileNameMap.get(partition)));`

##
File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java
##
@@ -302,6 +317,54 @@ private void commitJobForPartition(JobContext context, 
boolean overwriteSet,
 commitJobFinal(context, loadModel, operationContext, carbonTable, 
uniqueId);
   }
 
+  @SuppressWarnings("unchecked")
+  private void writeSegmentWithoutMergeIndex(JobContext context, 
CarbonLoadModel loadModel,
+  String segmentFileName, String partitionPath) throws IOException {
+Map IndexFileNameMap = (Map) 
ObjectSerializationUtil
+
.convertStringToObject(context.getConfiguration().get("carbon.index.files.name"));
+List partitionList =
+(List) 
ObjectSerializationUtil.convertStringToObject(partitionPath);
+SegmentFileStore.SegmentFile finalSegmentFile = null;
+boolean isRelativePath;
+String path;
+for (String partition : partitionList) {
+  isRelativePath = false;
+  path = partition;
+  if (path.startsWith(loadModel.getTablePath())) {
+path = path.substring(loadModel.getTablePath().length());
+isRelativePath = true;
+  }
+  SegmentFileStore.SegmentFile segmentFile = new 
SegmentFileStore.SegmentFile();
+  SegmentFileStore.FolderDetails folderDetails = new 
SegmentFileStore.FolderDetails();
+  Set set = new HashSet();

Review comment:
   Modified

##
File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java
##
@@ -302,6 +317,54 @@ private void commitJobForPartition(JobContext context, 
boolean overwriteSet,
 commitJobFinal(context, loadModel, operationContext, carbonTable, 
uniqueId);
   }
 
+  @SuppressWarnings("unchecked")
+  private void writeSegmentWithoutMergeIndex(JobContext context, 
CarbonLoadModel loadModel,
+  String segmentFileName, String partitionPath) throws IOException {
+Map IndexFileNameMap = (Map) 
ObjectSerializationUtil
+
.convertStringToObject(context.getConfiguration().get("carbon.index.files.name"));
+List partitionList =
+(List) 
ObjectSerializationUtil.convertStringToObject(partitionPath);
+SegmentFileStore.SegmentFile finalSegmentFile = null;
+boolean isRelativePath;
+String path;
+for (String partition : partitionList) {
+  isRelativePath = false;
+  path = partition;
+  if (path.startsWith(loadModel.getTablePath())) {
+path = path.substring(loadModel.getTablePath().length());
+isRelativePath = true;
+  }
+  SegmentFileStore.SegmentFile segmentFile = new 
SegmentFileStore.SegmentFile();
+  SegmentFileStore.FolderDetails folderDetails = new 
SegmentFileStore.FolderDetails();
+  Set set = new HashSet();
+