[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3802: [CARBONDATA-3885] [CARBONDATA-3884] Delete Stale Segment files from Metadata folders when SI segments are deleted and

2020-07-07 Thread GitBox


vikramahuja1001 commented on a change in pull request #3802:
URL: https://github.com/apache/carbondata/pull/3802#discussion_r451286366



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/load/CarbonInternalLoaderUtil.java
##
@@ -37,7 +38,6 @@
 import org.apache.carbondata.core.util.path.CarbonTablePath;
 import org.apache.carbondata.processing.loading.model.CarbonLoadModel;
 import org.apache.carbondata.processing.util.CarbonLoaderUtil;
-

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.

2020-07-07 Thread GitBox


VenuReddy2103 commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r451275614



##
File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java
##
@@ -302,6 +318,61 @@ private void commitJobForPartition(JobContext context, 
boolean overwriteSet,
 commitJobFinal(context, loadModel, operationContext, carbonTable, 
uniqueId);
   }
 
+  /**
+   * Method to create and write the segment file, removes the temporary 
directories from all the
+   * respective partition directories. This method is invoked only when {@link
+   * CarbonCommonConstants#CARBON_MERGE_INDEX_IN_SEGMENT} is disabled.
+   * @param context Job context
+   * @param loadModel Load model
+   * @param segmentFileName Segment file name to write
+   * @param partitionPath Serialized list of partition location
+   * @throws IOException
+   */
+  @SuppressWarnings("unchecked")
+  private void writeSegmentWithoutMergeIndex(JobContext context, 
CarbonLoadModel loadModel,
+  String segmentFileName, String partitionPath) throws IOException {
+Map indexFileNameMap = (Map) 
ObjectSerializationUtil
+
.convertStringToObject(context.getConfiguration().get("carbon.index.files.name"));
+List partitionList =
+(List) 
ObjectSerializationUtil.convertStringToObject(partitionPath);
+SegmentFileStore.SegmentFile finalSegmentFile = null;
+boolean isRelativePath;
+String partitionLoc;
+for (String partition : partitionList) {
+  isRelativePath = false;
+  partitionLoc = partition;
+  if (partitionLoc.startsWith(loadModel.getTablePath())) {
+partitionLoc = 
partitionLoc.substring(loadModel.getTablePath().length());
+isRelativePath = true;
+  }
+  SegmentFileStore.SegmentFile segmentFile = new 
SegmentFileStore.SegmentFile();
+  SegmentFileStore.FolderDetails folderDetails = new 
SegmentFileStore.FolderDetails();
+  
folderDetails.setFiles(Collections.singleton(indexFileNameMap.get(partition)));
+  folderDetails.setPartitions(
+  
Collections.singletonList(partitionLoc.substring(partitionLoc.indexOf("/") + 
1)));
+  folderDetails.setRelative(isRelativePath);
+  folderDetails.setStatus(SegmentStatus.SUCCESS.getMessage());
+  segmentFile.getLocationMap().put(partitionLoc, folderDetails);
+  if (finalSegmentFile != null) {
+finalSegmentFile = finalSegmentFile.merge(segmentFile);
+  } else {
+finalSegmentFile = segmentFile;
+  }
+}
+Objects.requireNonNull(finalSegmentFile);
+String segmentFilesLocation =

Review comment:
   Agreed. Moved this code to SegmentFileStore.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.

2020-07-07 Thread GitBox


VenuReddy2103 commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r451275288



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableLoadingTestCase.scala
##
@@ -640,6 +653,10 @@ class StandardPartitionTableLoadingTestCase extends 
QueryTest with BeforeAndAfte
 }
   }
 
+  override def afterEach(): Unit = {
+CarbonProperties.getInstance()

Review comment:
   ok. Removed this afterEach.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-07-07 Thread GitBox


CarbonDataQA1 commented on pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#issuecomment-655086214


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3322/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.merge.index.

2020-07-07 Thread GitBox


CarbonDataQA1 commented on pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#issuecomment-655062212


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3325/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.merge.index.

2020-07-07 Thread GitBox


CarbonDataQA1 commented on pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#issuecomment-655060959


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1585/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-07-07 Thread GitBox


CarbonDataQA1 commented on pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#issuecomment-655059772


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1583/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3829: [WIP] Fix maintable load failure in concurrent load and compaction sceneario

2020-07-07 Thread GitBox


CarbonDataQA1 commented on pull request #3829:
URL: https://github.com/apache/carbondata/pull/3829#issuecomment-655023313


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1581/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3829: [WIP] Fix maintable load failure in concurrent load and compaction sceneario

2020-07-07 Thread GitBox


CarbonDataQA1 commented on pull request #3829:
URL: https://github.com/apache/carbondata/pull/3829#issuecomment-655022713


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3320/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-07-07 Thread GitBox


nihal0107 commented on pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#issuecomment-654983835


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.

2020-07-07 Thread GitBox


VenuReddy2103 commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r450765502



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala
##
@@ -253,6 +255,14 @@ case class CarbonSQLHadoopMapReduceCommitProtocol(jobId: 
String, path: String, i
 if (size.isDefined) {
   dataSize = dataSize + java.lang.Long.parseLong(size.get)
 }
+val indexSize = map.get("carbon.indexsize")
+if (indexSize.isDefined) {
+  indexLen = indexLen + java.lang.Long.parseLong(indexSize.get)
+}
+val indexFiles = map.get("carbon.index.files.name")
+if (indexFiles.isDefined) {
+  indexFilesName = indexFiles.get

Review comment:
   It is a serialied map. Like "carbon.output.partitions.name" and 
"carbon.output.files.name", "carbon.index.files.name" is also serialized. We 
deserialize and use





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.

2020-07-07 Thread GitBox


VenuReddy2103 commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r450765502



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala
##
@@ -253,6 +255,14 @@ case class CarbonSQLHadoopMapReduceCommitProtocol(jobId: 
String, path: String, i
 if (size.isDefined) {
   dataSize = dataSize + java.lang.Long.parseLong(size.get)
 }
+val indexSize = map.get("carbon.indexsize")
+if (indexSize.isDefined) {
+  indexLen = indexLen + java.lang.Long.parseLong(indexSize.get)
+}
+val indexFiles = map.get("carbon.index.files.name")
+if (indexFiles.isDefined) {
+  indexFilesName = indexFiles.get

Review comment:
   It is a serialied map. Like "carbon.output.partitions.name" and 
"carbon.output.files.name", "carbon.index.files.name" is also serialized. We 
deserialize and use it before calling SegmentFileStore.writeSegmentFile





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-07-07 Thread GitBox


CarbonDataQA1 commented on pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#issuecomment-654963937


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1580/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat opened a new pull request #3829: [WIP] Fix maintable load failure in concurrent load and compaction sceneario

2020-07-07 Thread GitBox


ajantha-bhat opened a new pull request #3829:
URL: https://github.com/apache/carbondata/pull/3829


### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-07-07 Thread GitBox


CarbonDataQA1 commented on pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#issuecomment-654937795


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3319/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3828: [CARBONDATA-3889] Cleanup typo code for carbondata-core module

2020-07-07 Thread GitBox


CarbonDataQA1 commented on pull request #3828:
URL: https://github.com/apache/carbondata/pull/3828#issuecomment-654910077


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1579/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3828: [CARBONDATA-3889] Cleanup typo code for carbondata-core module

2020-07-07 Thread GitBox


CarbonDataQA1 commented on pull request #3828:
URL: https://github.com/apache/carbondata/pull/3828#issuecomment-654907957


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3318/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3786: [CARBONDATA-3842] Fix incorrect results on mv with limit (Missed code during mv refcatory)

2020-07-07 Thread GitBox


CarbonDataQA1 commented on pull request #3786:
URL: https://github.com/apache/carbondata/pull/3786#issuecomment-654872715


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3316/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

2020-07-07 Thread GitBox


CarbonDataQA1 commented on pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#issuecomment-654872523


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1577/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

2020-07-07 Thread GitBox


CarbonDataQA1 commented on pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#issuecomment-654872025


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3315/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3786: [CARBONDATA-3842] Fix incorrect results on mv with limit (Missed code during mv refcatory)

2020-07-07 Thread GitBox


CarbonDataQA1 commented on pull request #3786:
URL: https://github.com/apache/carbondata/pull/3786#issuecomment-654870645


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1578/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-07-07 Thread GitBox


nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r450849269



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +613,332 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading CSV files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath) {
+if (filePath.length() == 0) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+this.filePath = filePath;
+this.isDirectory = new File(filePath).isDirectory();
+this.withCsvInput();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts CSV files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the CSV file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath, List 
fileList) {
+this.fileList = fileList;
+this.withCsvPath(filePath);
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading Parquet files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath) throws 
IOException {
+if (filePath.length() == 0) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+this.filePath = filePath;
+this.isDirectory = new File(filePath).isDirectory();
+this.writerType = WRITER_TYPE.PARQUET;
+this.buildParquetReader();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts parquet files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the parquet file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withParquetPath(filePath);
+return this;
+  }
+
+  private void buildParquetReader() throws IOException {
+AvroReadSupport avroReadSupport = new AvroReadSupport<>();
+ParquetReader parquetReader;
+if (this.isDirectory) {
+  if (this.fileList == null || this.fileList.size() == 0) {
+File[] dataFiles = new File(this.filePath).listFiles();
+if (dataFiles == null || dataFiles.length == 0) {
+  throw new RuntimeException("No Parquet file found at given location. 
Please provide" +
+  "the correct folder location.");
+}
+parquetReader = ParquetReader.builder(avroReadSupport,

Review comment:
   Handle the scenario:
   Before calling write() I am checking all the file formats. If they are not 
of same type then throwing the exception. These cases are handled for parquet, 
ORC, and avro file.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-07-07 Thread GitBox


nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r450848843



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +613,332 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading CSV files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath) {
+if (filePath.length() == 0) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+this.filePath = filePath;
+this.isDirectory = new File(filePath).isDirectory();
+this.withCsvInput();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts CSV files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the CSV file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath, List 
fileList) {
+this.fileList = fileList;
+this.withCsvPath(filePath);
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading Parquet files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath) throws 
IOException {
+if (filePath.length() == 0) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+this.filePath = filePath;
+this.isDirectory = new File(filePath).isDirectory();
+this.writerType = WRITER_TYPE.PARQUET;
+this.buildParquetReader();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts parquet files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the parquet file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withParquetPath(filePath);
+return this;
+  }
+
+  private void buildParquetReader() throws IOException {
+AvroReadSupport avroReadSupport = new AvroReadSupport<>();
+ParquetReader parquetReader;
+if (this.isDirectory) {
+  if (this.fileList == null || this.fileList.size() == 0) {
+File[] dataFiles = new File(this.filePath).listFiles();
+if (dataFiles == null || dataFiles.length == 0) {
+  throw new RuntimeException("No Parquet file found at given location. 
Please provide" +
+  "the correct folder location.");
+}
+parquetReader = ParquetReader.builder(avroReadSupport,
+new Path(String.valueOf(dataFiles[0]))).build();
+  } else {
+parquetReader = ParquetReader.builder(avroReadSupport,

Review comment:
   checked the schema at the time of the carbon writer builder. If not all 
the files are having the same schema then throwing the exception.

##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +613,332 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading CSV files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath) {
+if (filePath.length() == 0) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+this.filePath = filePath;
+this.isDirectory = new File(filePath).isDirectory();
+this.withCsvInput();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts CSV files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the CSV file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath, List 
fileList) {
+this.fileList = fileList;
+this.withCsvPath(filePath);
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading Parquet files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath) throws 
IOException {
+if (filePath.length() == 0) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+this.filePath = filePath;
+

[GitHub] [carbondata] nihal0107 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-07-07 Thread GitBox


nihal0107 commented on pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#issuecomment-654840974


   > @nihal0107 Please remove unused binary files from this PR
   
   Those are not the unused binary files. Those are either parquet or ORC or 
avro file which I am using inside the UTs.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-07-07 Thread GitBox


nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r450845011



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +613,332 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading CSV files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath) {
+if (filePath.length() == 0) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+this.filePath = filePath;
+this.isDirectory = new File(filePath).isDirectory();
+this.withCsvInput();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts CSV files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the CSV file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath, List 
fileList) {
+this.fileList = fileList;
+this.withCsvPath(filePath);
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading Parquet files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath) throws 
IOException {
+if (filePath.length() == 0) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+this.filePath = filePath;
+this.isDirectory = new File(filePath).isDirectory();
+this.writerType = WRITER_TYPE.PARQUET;
+this.buildParquetReader();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts parquet files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the parquet file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withParquetPath(filePath);
+return this;
+  }
+
+  private void buildParquetReader() throws IOException {
+AvroReadSupport avroReadSupport = new AvroReadSupport<>();
+ParquetReader parquetReader;
+if (this.isDirectory) {
+  if (this.fileList == null || this.fileList.size() == 0) {
+File[] dataFiles = new File(this.filePath).listFiles();
+if (dataFiles == null || dataFiles.length == 0) {
+  throw new RuntimeException("No Parquet file found at given location. 
Please provide" +
+  "the correct folder location.");
+}
+parquetReader = ParquetReader.builder(avroReadSupport,
+new Path(String.valueOf(dataFiles[0]))).build();
+  } else {
+parquetReader = ParquetReader.builder(avroReadSupport,
+new Path(this.filePath + "/" + this.fileList.get(0))).build();
+  }
+} else {
+  parquetReader = ParquetReader.builder(avroReadSupport,
+  new Path(this.filePath)).build();
+}
+this.avroSchema = parquetReader.read().getSchema();
+this.schema = 
AvroCarbonWriter.getCarbonSchemaFromAvroSchema(this.avroSchema);
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading ORC files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withOrcPath(String filePath) throws IOException {
+if (filePath.length() == 0) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+this.filePath = filePath;
+this.isDirectory = new File(filePath).isDirectory();
+this.writerType = WRITER_TYPE.ORC;
+Map options = new HashMap<>();
+options.put("complex_delimiter_level_1", "#");
+options.put("complex_delimiter_level_2", "$");
+options.put("complex_delimiter_level_3", "@");
+this.withLoadOptions(options);
+this.buildOrcReader();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts orc files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the orc file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withOrcPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withOrcPath(filePath);
+return this;
+  }
+
+  // build orc reader and convert orc schema to carbon schema.
+  private void buildOrcReader() throws 

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-07-07 Thread GitBox


nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r450844563



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ParquetCarbonWriter.java
##
@@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.file;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.List;
+
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.parquet.avro.AvroReadSupport;
+import org.apache.parquet.hadoop.ParquetReader;
+
+/**
+ * Implementation to write parquet rows in avro format to carbondata file.
+ */
+public class ParquetCarbonWriter extends AvroCarbonWriter {
+  private AvroCarbonWriter avroCarbonWriter = null;
+  private String filePath = "";
+  private boolean isDirectory = false;
+  private List fileList;
+
+  ParquetCarbonWriter(AvroCarbonWriter avroCarbonWriter) {
+this.avroCarbonWriter = avroCarbonWriter;
+  }
+
+  @Override
+  public void setFilePath(String filePath) {
+this.filePath = filePath;
+  }
+
+  @Override
+  public void setIsDirectory(boolean isDirectory) {
+this.isDirectory = isDirectory;
+  }
+
+  @Override
+  public void setFileList(List fileList) {
+this.fileList = fileList;
+  }
+
+  /**
+   * Load data of all parquet files at given location iteratively.
+   *
+   * @throws IOException
+   */
+  @Override
+  public void write() throws IOException {
+if (this.filePath.length() == 0) {
+  throw new RuntimeException("'withParquetPath()' " +
+  "must be called to support load parquet files");
+}
+if (this.avroCarbonWriter == null) {
+  throw new RuntimeException("avro carbon writer can not be null");
+}
+if (this.isDirectory) {
+  if (this.fileList == null || this.fileList.size() == 0) {
+File[] dataFiles = new File(this.filePath).listFiles();
+if (dataFiles == null || dataFiles.length == 0) {
+  throw new RuntimeException("No Parquet file found at given location. 
Please provide " +
+  "the correct folder location.");
+}
+Arrays.sort(dataFiles);
+for (File dataFile : dataFiles) {
+  this.loadSingleFile(dataFile);
+}
+  } else {
+for (String file : this.fileList) {
+  this.loadSingleFile(new File(this.filePath + "/" + file));
+}
+  }
+} else {
+  this.loadSingleFile(new File(this.filePath));
+}
+  }
+
+  private void loadSingleFile(File file) throws IOException {
+AvroReadSupport avroReadSupport = new AvroReadSupport<>();
+ParquetReader parquetReader = 
ParquetReader.builder(avroReadSupport,
+new Path(String.valueOf(file))).withConf(new Configuration()).build();
+GenericRecord genericRecord = null;

Review comment:
   changed as per suggestion. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-07-07 Thread GitBox


nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r450844071



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ORCCarbonWriter.java
##
@@ -0,0 +1,196 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.file;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.*;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.io.orc.OrcFile;
+import org.apache.hadoop.hive.ql.io.orc.OrcStruct;
+import org.apache.hadoop.hive.ql.io.orc.Reader;
+import org.apache.hadoop.hive.ql.io.orc.RecordReader;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.io.Text;
+
+/**
+ * Implementation to write ORC rows in CSV format to carbondata file.
+ */
+public class ORCCarbonWriter extends CSVCarbonWriter {
+  private CSVCarbonWriter csvCarbonWriter = null;
+  private String filePath = "";
+  private Reader orcReader = null;
+  private boolean isDirectory = false;
+  private List fileList;
+
+  ORCCarbonWriter(CSVCarbonWriter csvCarbonWriter) {
+this.csvCarbonWriter = csvCarbonWriter;
+  }
+
+  @Override
+  public void setFilePath(String filePath) {
+this.filePath = filePath;
+  }
+
+  @Override
+  public void setIsDirectory(boolean isDirectory) {
+this.isDirectory = isDirectory;
+  }
+
+  @Override
+  public void setFileList(List fileList) {
+this.fileList = fileList;
+  }
+
+  /**
+   * Load ORC file in iterative way.
+   */
+  @Override
+  public void write() throws IOException {
+if (this.filePath.length() == 0) {
+  throw new RuntimeException("'withOrcPath()' must be called to support 
load ORC files");
+}
+if (this.csvCarbonWriter == null) {
+  throw new RuntimeException("csv carbon writer can not be null");
+}
+if (this.isDirectory) {
+  if (this.fileList == null || this.fileList.size() == 0) {
+File[] dataFiles = new File(this.filePath).listFiles();
+if (dataFiles == null || dataFiles.length == 0) {
+  throw new RuntimeException("No ORC file found at given location. 
Please provide " +
+  "the correct folder location.");
+}
+for (File dataFile : dataFiles) {
+  this.loadSingleFile(dataFile);
+}
+  } else {
+for (String file : this.fileList) {
+  this.loadSingleFile(new File(this.filePath + "/" + file));
+}
+  }
+} else {
+  this.loadSingleFile(new File(this.filePath));
+}
+  }
+
+  private void loadSingleFile(File file) throws IOException {
+orcReader = OrcFile.createReader(new Path(String.valueOf(file)),
+OrcFile.readerOptions(new Configuration()));
+ObjectInspector objectInspector = orcReader.getObjectInspector();
+RecordReader recordReader = orcReader.rows();
+if (objectInspector instanceof StructObjectInspector) {
+  StructObjectInspector structObjectInspector =
+  (StructObjectInspector) orcReader.getObjectInspector();
+  while (recordReader.hasNext()) {
+Object record = recordReader.next(null); // to remove duplicacy.
+List valueList = 
structObjectInspector.getStructFieldsDataAsList(record);
+for (int i = 0; i < valueList.size(); i++) {
+  valueList.set(i, parseOrcObject(valueList.get(i), 0));
+}
+this.csvCarbonWriter.write(valueList.toArray());
+  }
+} else {
+  while (recordReader.hasNext()) {
+Object record = recordReader.next(null); // to remove duplicacy.
+this.csvCarbonWriter.write(new Object[]{parseOrcObject(record, 0)});
+  }
+}
+  }
+
+  private String parseOrcObject(Object obj, int level) {
+if (obj instanceof OrcStruct) {
+  Objects.requireNonNull(orcReader);
+  StructObjectInspector structObjectInspector = (StructObjectInspector) 
orcReader
+  .getObjectInspector();
+  List value = structObjectInspector.getStructFieldsDataAsList(obj);
+  for (int i = 0; i < value.size(); i++) {
+value.set(i, 

[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-07-07 Thread GitBox


nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r450844187



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ParquetCarbonWriter.java
##
@@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.file;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.List;
+
+import org.apache.avro.generic.GenericRecord;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.parquet.avro.AvroReadSupport;
+import org.apache.parquet.hadoop.ParquetReader;
+
+/**
+ * Implementation to write parquet rows in avro format to carbondata file.
+ */
+public class ParquetCarbonWriter extends AvroCarbonWriter {
+  private AvroCarbonWriter avroCarbonWriter = null;
+  private String filePath = "";
+  private boolean isDirectory = false;
+  private List fileList;
+
+  ParquetCarbonWriter(AvroCarbonWriter avroCarbonWriter) {
+this.avroCarbonWriter = avroCarbonWriter;
+  }
+
+  @Override
+  public void setFilePath(String filePath) {
+this.filePath = filePath;
+  }
+
+  @Override
+  public void setIsDirectory(boolean isDirectory) {
+this.isDirectory = isDirectory;
+  }
+
+  @Override
+  public void setFileList(List fileList) {
+this.fileList = fileList;
+  }
+
+  /**
+   * Load data of all parquet files at given location iteratively.
+   *
+   * @throws IOException
+   */
+  @Override
+  public void write() throws IOException {
+if (this.filePath.length() == 0) {
+  throw new RuntimeException("'withParquetPath()' " +
+  "must be called to support load parquet files");
+}
+if (this.avroCarbonWriter == null) {
+  throw new RuntimeException("avro carbon writer can not be null");
+}
+if (this.isDirectory) {
+  if (this.fileList == null || this.fileList.size() == 0) {
+File[] dataFiles = new File(this.filePath).listFiles();
+if (dataFiles == null || dataFiles.length == 0) {
+  throw new RuntimeException("No Parquet file found at given location. 
Please provide " +
+  "the correct folder location.");
+}
+Arrays.sort(dataFiles);
+for (File dataFile : dataFiles) {
+  this.loadSingleFile(dataFile);
+}
+  } else {
+for (String file : this.fileList) {
+  this.loadSingleFile(new File(this.filePath + "/" + file));
+}
+  }
+} else {
+  this.loadSingleFile(new File(this.filePath));
+}
+  }
+
+  private void loadSingleFile(File file) throws IOException {
+AvroReadSupport avroReadSupport = new AvroReadSupport<>();
+ParquetReader parquetReader = 
ParquetReader.builder(avroReadSupport,
+new Path(String.valueOf(file))).withConf(new Configuration()).build();
+GenericRecord genericRecord = null;
+while ((genericRecord = parquetReader.read()) != null) {
+  System.out.println(genericRecord);

Review comment:
   removed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-07-07 Thread GitBox


nihal0107 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r450843943



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ORCCarbonWriter.java
##
@@ -0,0 +1,196 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.file;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.*;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.ql.io.orc.OrcFile;
+import org.apache.hadoop.hive.ql.io.orc.OrcStruct;
+import org.apache.hadoop.hive.ql.io.orc.Reader;
+import org.apache.hadoop.hive.ql.io.orc.RecordReader;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.io.Text;
+
+/**
+ * Implementation to write ORC rows in CSV format to carbondata file.
+ */
+public class ORCCarbonWriter extends CSVCarbonWriter {
+  private CSVCarbonWriter csvCarbonWriter = null;
+  private String filePath = "";
+  private Reader orcReader = null;
+  private boolean isDirectory = false;
+  private List fileList;
+
+  ORCCarbonWriter(CSVCarbonWriter csvCarbonWriter) {
+this.csvCarbonWriter = csvCarbonWriter;
+  }
+
+  @Override
+  public void setFilePath(String filePath) {
+this.filePath = filePath;
+  }
+
+  @Override
+  public void setIsDirectory(boolean isDirectory) {
+this.isDirectory = isDirectory;
+  }
+
+  @Override
+  public void setFileList(List fileList) {
+this.fileList = fileList;
+  }
+
+  /**
+   * Load ORC file in iterative way.
+   */
+  @Override
+  public void write() throws IOException {
+if (this.filePath.length() == 0) {
+  throw new RuntimeException("'withOrcPath()' must be called to support 
load ORC files");
+}
+if (this.csvCarbonWriter == null) {
+  throw new RuntimeException("csv carbon writer can not be null");
+}
+if (this.isDirectory) {
+  if (this.fileList == null || this.fileList.size() == 0) {
+File[] dataFiles = new File(this.filePath).listFiles();
+if (dataFiles == null || dataFiles.length == 0) {
+  throw new RuntimeException("No ORC file found at given location. 
Please provide " +
+  "the correct folder location.");
+}
+for (File dataFile : dataFiles) {
+  this.loadSingleFile(dataFile);
+}
+  } else {
+for (String file : this.fileList) {
+  this.loadSingleFile(new File(this.filePath + "/" + file));
+}
+  }
+} else {
+  this.loadSingleFile(new File(this.filePath));
+}
+  }
+
+  private void loadSingleFile(File file) throws IOException {
+orcReader = OrcFile.createReader(new Path(String.valueOf(file)),
+OrcFile.readerOptions(new Configuration()));
+ObjectInspector objectInspector = orcReader.getObjectInspector();
+RecordReader recordReader = orcReader.rows();
+if (objectInspector instanceof StructObjectInspector) {
+  StructObjectInspector structObjectInspector =
+  (StructObjectInspector) orcReader.getObjectInspector();
+  while (recordReader.hasNext()) {
+Object record = recordReader.next(null); // to remove duplicacy.
+List valueList = 
structObjectInspector.getStructFieldsDataAsList(record);
+for (int i = 0; i < valueList.size(); i++) {
+  valueList.set(i, parseOrcObject(valueList.get(i), 0));
+}
+this.csvCarbonWriter.write(valueList.toArray());
+  }
+} else {
+  while (recordReader.hasNext()) {
+Object record = recordReader.next(null); // to remove duplicacy.
+this.csvCarbonWriter.write(new Object[]{parseOrcObject(record, 0)});
+  }
+}
+  }
+
+  private String parseOrcObject(Object obj, int level) {

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [carbondata] QiangCai opened a new pull request #3828: [CARBONDATA-3889] Cleanup typo code for carbondata-core module

2020-07-07 Thread GitBox


QiangCai opened a new pull request #3828:
URL: https://github.com/apache/carbondata/pull/3828


### Why is this PR needed?
There are many typos in carbondata-core module

### What changes were proposed in this PR?
   Cleanup typo code for carbondata-core module
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #3802: [CARBONDATA-3885] [CARBONDATA-3884] Delete Stale Segment files from Metadata folders when SI segments are deleted and Fix for Concurrent

2020-07-07 Thread GitBox


akashrn5 commented on pull request #3802:
URL: https://github.com/apache/carbondata/pull/3802#issuecomment-654791824


   @vikramahuja1001 the  issue reproduce steps, please mention in jira and for 
PR description, give proper issue, root cause and solution proposed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3802: [CARBONDATA-3885] [CARBONDATA-3884] Delete Stale Segment files from Metadata folders when SI segments are deleted and Fix fo

2020-07-07 Thread GitBox


akashrn5 commented on a change in pull request #3802:
URL: https://github.com/apache/carbondata/pull/3802#discussion_r450316210



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/load/CarbonInternalLoaderUtil.java
##
@@ -37,7 +38,6 @@
 import org.apache.carbondata.core.util.path.CarbonTablePath;
 import org.apache.carbondata.processing.loading.model.CarbonLoadModel;
 import org.apache.carbondata.processing.util.CarbonLoaderUtil;
-

Review comment:
   revert this





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #3786: [CARBONDATA-3842] Fix incorrect results on mv with limit (Missed code during mv refcatory)

2020-07-07 Thread GitBox


akashrn5 commented on pull request #3786:
URL: https://github.com/apache/carbondata/pull/3786#issuecomment-654788035


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

2020-07-07 Thread GitBox


akashrn5 commented on pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#issuecomment-654787100


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] asfgit closed pull request #3808: [CARBONDATA-3873] Secondary index compaction with maintable clean files causing exception

2020-07-07 Thread GitBox


asfgit closed pull request #3808:
URL: https://github.com/apache/carbondata/pull/3808


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

2020-07-07 Thread GitBox


akashrn5 commented on pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#issuecomment-654776383


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.

2020-07-07 Thread GitBox


VenuReddy2103 commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r450765502



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala
##
@@ -253,6 +255,14 @@ case class CarbonSQLHadoopMapReduceCommitProtocol(jobId: 
String, path: String, i
 if (size.isDefined) {
   dataSize = dataSize + java.lang.Long.parseLong(size.get)
 }
+val indexSize = map.get("carbon.indexsize")
+if (indexSize.isDefined) {
+  indexLen = indexLen + java.lang.Long.parseLong(indexSize.get)
+}
+val indexFiles = map.get("carbon.index.files.name")
+if (indexFiles.isDefined) {
+  indexFilesName = indexFiles.get

Review comment:
   It is a serialied map. Like "carbon.output.partitions.name" and 
"carbon.output.files.name", "carbon.index.files.name" is also serialized. We 
deserialize and use in writeSegmentWithoutMergeIndex().





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.

2020-07-07 Thread GitBox


VenuReddy2103 commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r450765502



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala
##
@@ -253,6 +255,14 @@ case class CarbonSQLHadoopMapReduceCommitProtocol(jobId: 
String, path: String, i
 if (size.isDefined) {
   dataSize = dataSize + java.lang.Long.parseLong(size.get)
 }
+val indexSize = map.get("carbon.indexsize")
+if (indexSize.isDefined) {
+  indexLen = indexLen + java.lang.Long.parseLong(indexSize.get)
+}
+val indexFiles = map.get("carbon.index.files.name")
+if (indexFiles.isDefined) {
+  indexFilesName = indexFiles.get

Review comment:
   It is a serialied.
   
   "carbon.output.partitions.name"





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3827: [CARBONDATA-3889] Cleanup code for carbondata-hadoop module

2020-07-07 Thread GitBox


CarbonDataQA1 commented on pull request #3827:
URL: https://github.com/apache/carbondata/pull/3827#issuecomment-654748600


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1576/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3827: [CARBONDATA-3889] Cleanup code for carbondata-hadoop module

2020-07-07 Thread GitBox


CarbonDataQA1 commented on pull request #3827:
URL: https://github.com/apache/carbondata/pull/3827#issuecomment-654746270


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3314/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.

2020-07-07 Thread GitBox


VenuReddy2103 commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r450746967



##
File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java
##
@@ -282,10 +296,12 @@ private void commitJobForPartition(JobContext context, 
boolean overwriteSet,
 throw new IOException(e);
   }
 }
-String segmentFileName = SegmentFileStore.genSegmentFileName(
-loadModel.getSegmentId(), 
String.valueOf(loadModel.getFactTimeStamp()));
 newMetaEntry.setSegmentFile(segmentFileName + CarbonTablePath.SEGMENT_EXT);
-newMetaEntry.setIndexSize("" + loadModel.getMetrics().getMergeIndexSize());
+if (isMergeIndex) {

Review comment:
   loadModel.getMetrics().getMergeIndexSize() is filled in 
MergeIndexEventListene.onEvent() when mergeindex is created. So, can't make it 
else case to line 280.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.

2020-07-07 Thread GitBox


VenuReddy2103 commented on a change in pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#discussion_r450746967



##
File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java
##
@@ -282,10 +296,12 @@ private void commitJobForPartition(JobContext context, 
boolean overwriteSet,
 throw new IOException(e);
   }
 }
-String segmentFileName = SegmentFileStore.genSegmentFileName(
-loadModel.getSegmentId(), 
String.valueOf(loadModel.getFactTimeStamp()));
 newMetaEntry.setSegmentFile(segmentFileName + CarbonTablePath.SEGMENT_EXT);
-newMetaEntry.setIndexSize("" + loadModel.getMetrics().getMergeIndexSize());
+if (isMergeIndex) {

Review comment:
   loadModel.getMetrics().getMergeIndexSize() filled in 
MergeIndexEventListene.onEvent() when mergeindex is created. So, can't make it 
else case to line 280.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3808: [CARBONDATA-3873] Secondary index compaction with maintable clean files causing exception

2020-07-07 Thread GitBox


CarbonDataQA1 commented on pull request #3808:
URL: https://github.com/apache/carbondata/pull/3808#issuecomment-654734465


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1575/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3808: [CARBONDATA-3873] Secondary index compaction with maintable clean files causing exception

2020-07-07 Thread GitBox


CarbonDataQA1 commented on pull request #3808:
URL: https://github.com/apache/carbondata/pull/3808#issuecomment-654733164


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3313/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

2020-07-07 Thread GitBox


CarbonDataQA1 commented on pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#issuecomment-654732910


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1574/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3811: [CARBONDATA-3874] segment mismatch between maintable and SI table when load with concurrency

2020-07-07 Thread GitBox


CarbonDataQA1 commented on pull request #3811:
URL: https://github.com/apache/carbondata/pull/3811#issuecomment-654730812


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3312/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on pull request #3817: [CARBONDATA-3845] Bucket table creation fails with exception for empt…

2020-07-07 Thread GitBox


Indhumathi27 commented on pull request #3817:
URL: https://github.com/apache/carbondata/pull/3817#issuecomment-654728062


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai opened a new pull request #3827: [CARBONDATA-3889] Cleanup code for carbondata-hadoop module

2020-07-07 Thread GitBox


QiangCai opened a new pull request #3827:
URL: https://github.com/apache/carbondata/pull/3827


### Why is this PR needed?
need cleanup code for carbondata-hadoop module

### What changes were proposed in this PR?
   Cleanup code for carbondata-hadoop module
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #3808: [CARBONDATA-3873] Secondary index compaction with maintable clean files causing exception

2020-07-07 Thread GitBox


akashrn5 commented on pull request #3808:
URL: https://github.com/apache/carbondata/pull/3808#issuecomment-654663162


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3808: [CARBONDATA-3873] Secondary index compaction with maintable clean files causing exception

2020-07-07 Thread GitBox


akashrn5 commented on a change in pull request #3808:
URL: https://github.com/apache/carbondata/pull/3808#discussion_r450658081



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/load/Compactor.scala
##
@@ -121,10 +124,20 @@ object Compactor {
   segmentIdToLoadStartTimeMapping(validSegments.head),
   SegmentStatus.SUCCESS,
   carbonLoadModelForMergeDataFiles.getFactTimeStamp, 
rebuiltSegments.toList.asJava)
-
+siCompactionIndexList ::= indexCarbonTable
   } catch {
 case ex: Exception =>
   LOGGER.error(s"Compaction failed for SI table 
${secondaryIndex.indexName}", ex)
+  // If any compaction is failed then make all SI disabled which are 
success.
+  // They will be enabled in next load
+  siCompactionIndexList.foreach { indexCarbonTable =>
+sparkSession.sql(
+  s"""
+ | ALTER TABLE 
${carbonLoadModel.getDatabaseName}.${indexCarbonTable.getTableName}
+ | SET
+ | SERDEPROPERTIES ('isSITableEnabled' = 'false')

Review comment:
   move this line above





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org