[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-27 Thread GitBox


kunal642 commented on a change in pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#discussion_r478826107



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/IndexRepairCommand.scala
##
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command.index
+
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.execution.command.DataCommand
+import org.apache.spark.sql.hive.CarbonRelation
+import org.apache.spark.sql.index.CarbonIndexUtil
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.metadata.index.IndexType
+import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, 
SegmentStatusManager}
+import org.apache.carbondata.core.util.path.CarbonTablePath
+import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, 
CarbonLoadModel}
+
+/**
+ * Repair logic for reindex command on maintable/indextable
+ */
+case class IndexRepairCommand(indexnameOp: Option[String], tableIdentifier: 
TableIdentifier,
+  dbName: String,
+  segments: Option[List[String]]) extends 
DataCommand {
+
+  private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName)
+
+  def processData(sparkSession: SparkSession): Seq[Row] = {
+if (dbName == null) {
+  // dbName is null, repair for index table or all the index table in main 
table
+  val databaseName = if (tableIdentifier.database.isEmpty) {
+SparkSession.getActiveSession.get.catalog.currentDatabase
+  } else {
+tableIdentifier.database.get
+  }
+  triggerRepair(tableIdentifier.table, databaseName, indexnameOp, segments)
+} else {
+  // repairing si for all  index tables in the mentioned database in the 
repair command
+  sparkSession.sessionState.catalog.listTables(dbName).foreach {
+tableIdent =>
+  triggerRepair(tableIdent.table, dbName, indexnameOp, segments)
+  }
+}
+Seq.empty
+  }
+
+  def triggerRepair(tableNameOp: String, databaseName: String,
+indexTableToRepair: Option[String], segments: 
Option[List[String]]): Unit = {
+val sparkSession = SparkSession.getActiveSession.get
+// when Si creation and load to main table are parallel, get the 
carbonTable from the
+// metastore which will have the latest index Info
+val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
+val mainCarbonTable = metaStore
+  .lookupRelation(Some(databaseName), tableNameOp)(sparkSession)
+  .asInstanceOf[CarbonRelation].carbonTable
+
+val carbonLoadModel = new CarbonLoadModel
+carbonLoadModel.setDatabaseName(databaseName)
+carbonLoadModel.setTableName(tableNameOp)
+carbonLoadModel.setTablePath(mainCarbonTable.getTablePath)
+val tableStatusFilePath = 
CarbonTablePath.getTableStatusFilePath(mainCarbonTable.getTablePath)
+carbonLoadModel.setLoadMetadataDetails(SegmentStatusManager
+  .readTableStatusFile(tableStatusFilePath).toList.asJava)
+carbonLoadModel.setCarbonDataLoadSchema(new 
CarbonDataLoadSchema(mainCarbonTable))
+
+val indexMetadata = mainCarbonTable.getIndexMetadata
+val secondaryIndexProvider = IndexType.SI.getIndexProviderName
+if (null != indexMetadata && null != indexMetadata.getIndexesMap &&
+  null != indexMetadata.getIndexesMap.get(secondaryIndexProvider)) {
+  val indexTables = indexMetadata.getIndexesMap
+.get(secondaryIndexProvider).keySet().asScala
+  // if there are no index tables for a given fact table do not perform 
any action
+  if (indexTables.nonEmpty) {
+val mainTableDetails = if (segments.isEmpty) {
+  carbonLoadModel.getLoadMetadataDetails.asScala.toList
+  // SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)
+} else {
+  // get segments for main table
+  carbonLoadModel.getLoadMetadataDetails.asScala.toList.filter(
+  

[GitHub] [carbondata] nihal0107 commented on pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.

2020-08-27 Thread GitBox


nihal0107 commented on pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905#issuecomment-682327952


   retest this please.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-27 Thread GitBox


kunal642 commented on a change in pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#discussion_r478825910



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##
@@ -377,4 +381,212 @@ object CarbonIndexUtil {
   AlterTableUtil.releaseLocks(locks.asScala.toList)
 }
   }
+
+  def processSIRepair(indexTableName: String, carbonTable: CarbonTable,
+carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata,
+  mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: 
String)
+  (sparkSession: SparkSession) : Unit = {
+val sparkSession = SparkSession.getActiveSession.get
+// val databaseName = sparkSession.catalog.currentDatabase
+// when Si creation and load to main table are parallel, get the 
carbonTable from the
+// metastore which will have the latest index Info
+val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
+val indexTable = metaStore
+  .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)(
+sparkSession)
+  .asInstanceOf[CarbonRelation]
+  .carbonTable
+
+val mainTblLoadMetadataDetails: Array[LoadMetadataDetails] =
+  SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)

Review comment:
   Why "readLoadMetadata" is till there for maintable?
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI

2020-08-27 Thread GitBox


Karan980 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-682327208


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-27 Thread GitBox


vikramahuja1001 commented on a change in pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#discussion_r478837180



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##
@@ -377,4 +381,212 @@ object CarbonIndexUtil {
   AlterTableUtil.releaseLocks(locks.asScala.toList)
 }
   }
+
+  def processSIRepair(indexTableName: String, carbonTable: CarbonTable,
+carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata,
+  mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: 
String)
+  (sparkSession: SparkSession) : Unit = {
+val sparkSession = SparkSession.getActiveSession.get
+// val databaseName = sparkSession.catalog.currentDatabase
+// when Si creation and load to main table are parallel, get the 
carbonTable from the
+// metastore which will have the latest index Info
+val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
+val indexTable = metaStore
+  .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)(
+sparkSession)
+  .asInstanceOf[CarbonRelation]
+  .carbonTable
+
+val mainTblLoadMetadataDetails: Array[LoadMetadataDetails] =
+  SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)

Review comment:
   added it from the caller





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478824388



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  private void validateCsvFiles() throws IOException {
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION);
+if (CollectionUtils.isEmpty(Arrays.asList(dataFiles))) {
+  throw new RuntimeException("CSV files can't be empty.");
+}
+for (CarbonFile dataFile : dataFiles) {
+  try {
+CsvParser csvParser = SDKUtil.buildCsvParser(this.hadoopConf);
+
csvParser.beginParsing(FileFactory.getDataInputStream(dataFile.getPath(),
+-1, this.hadoopConf));
+  } catch (IllegalArgumentException ex) {
+if (ex.getCause() instanceof FileNotFoundException) {
+  throw new FileNotFoundException("File " + dataFile +
+  " not found to build carbon writer.");
+}
+throw ex;
+  }
+}
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading CSV files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withCsvInput();
+this.validateCsvFiles();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts CSV files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the CSV file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withCsvPath(filePath);
+return this;
+  }
+
+  private void validateJsonFiles() throws IOException {
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION);
+for (CarbonFile dataFile : dataFiles) {
+  try {
+new JSONParser().parse(SDKUtil.buildJsonReader(dataFile, 
this.hadoopConf));
+  } catch (FileNotFoundException ex) {
+throw new FileNotFoundException("File " + dataFile + " not found to 
build carbon writer.");
+  } catch (ParseException ex) {
+throw new RuntimeException("File " + dataFile + " is not in json 
format.");
+  }
+}
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading JSON files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withJsonInput();
+this.validateJsonFiles();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts JSON file directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the json file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withJsonPath(filePath);
+return this;
+  }
+
+  private void validateFilePath(String filePath) {
+if (StringUtils.isEmpty(filePath)) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading Parquet files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath) throws 
IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.writerType = WRITER_TYPE.PARQUET;
+this.validateParquetFiles();
+return this;
+  }
+
+  private void setIsDirectory(String filePath) {
+if (this.hadoopConf == null) {
+  this.hadoopConf = new Configuration(FileFactory.getConfiguration());
+}
+CarbonFile carbonFile = FileFactory.getCarbonFile(filePath, hadoopConf);
+this.isDirectory = carbonFile.isDirectory();
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts parquet files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the parquet file exists.
+   * @param fileList list of files which has to be 

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478824388



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  private void validateCsvFiles() throws IOException {
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION);
+if (CollectionUtils.isEmpty(Arrays.asList(dataFiles))) {
+  throw new RuntimeException("CSV files can't be empty.");
+}
+for (CarbonFile dataFile : dataFiles) {
+  try {
+CsvParser csvParser = SDKUtil.buildCsvParser(this.hadoopConf);
+
csvParser.beginParsing(FileFactory.getDataInputStream(dataFile.getPath(),
+-1, this.hadoopConf));
+  } catch (IllegalArgumentException ex) {
+if (ex.getCause() instanceof FileNotFoundException) {
+  throw new FileNotFoundException("File " + dataFile +
+  " not found to build carbon writer.");
+}
+throw ex;
+  }
+}
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading CSV files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withCsvInput();
+this.validateCsvFiles();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts CSV files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the CSV file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withCsvPath(filePath);
+return this;
+  }
+
+  private void validateJsonFiles() throws IOException {
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION);
+for (CarbonFile dataFile : dataFiles) {
+  try {
+new JSONParser().parse(SDKUtil.buildJsonReader(dataFile, 
this.hadoopConf));
+  } catch (FileNotFoundException ex) {
+throw new FileNotFoundException("File " + dataFile + " not found to 
build carbon writer.");
+  } catch (ParseException ex) {
+throw new RuntimeException("File " + dataFile + " is not in json 
format.");
+  }
+}
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading JSON files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withJsonInput();
+this.validateJsonFiles();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts JSON file directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the json file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withJsonPath(filePath);
+return this;
+  }
+
+  private void validateFilePath(String filePath) {
+if (StringUtils.isEmpty(filePath)) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading Parquet files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath) throws 
IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.writerType = WRITER_TYPE.PARQUET;
+this.validateParquetFiles();
+return this;
+  }
+
+  private void setIsDirectory(String filePath) {
+if (this.hadoopConf == null) {
+  this.hadoopConf = new Configuration(FileFactory.getConfiguration());
+}
+CarbonFile carbonFile = FileFactory.getCarbonFile(filePath, hadoopConf);
+this.isDirectory = carbonFile.isDirectory();
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts parquet files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the parquet file exists.
+   * @param fileList list of files which has to be 

[GitHub] [carbondata] kunal642 commented on pull request #3904: [CARBONDATA-3962]Remove unwanted empty fact directory in case of flat_folder table

2020-08-27 Thread GitBox


kunal642 commented on pull request #3904:
URL: https://github.com/apache/carbondata/pull/3904#issuecomment-682321292


   @akashrn5 please fix the build



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-27 Thread GitBox


vikramahuja1001 commented on a change in pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#discussion_r478831360



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/IndexRepairCommand.scala
##
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command.index
+
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.execution.command.DataCommand
+import org.apache.spark.sql.hive.CarbonRelation
+import org.apache.spark.sql.index.CarbonIndexUtil
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.metadata.index.IndexType
+import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, 
SegmentStatusManager}
+import org.apache.carbondata.core.util.path.CarbonTablePath
+import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, 
CarbonLoadModel}
+
+/**
+ * Repair logic for reindex command on maintable/indextable
+ */
+case class IndexRepairCommand(indexnameOp: Option[String], tableIdentifier: 
TableIdentifier,
+  dbName: String,
+  segments: Option[List[String]]) extends 
DataCommand {
+
+  private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName)
+
+  def processData(sparkSession: SparkSession): Seq[Row] = {
+if (dbName == null) {
+  // dbName is null, repair for index table or all the index table in main 
table
+  val databaseName = if (tableIdentifier.database.isEmpty) {
+SparkSession.getActiveSession.get.catalog.currentDatabase
+  } else {
+tableIdentifier.database.get
+  }
+  triggerRepair(tableIdentifier.table, databaseName, indexnameOp, segments)
+} else {
+  // repairing si for all  index tables in the mentioned database in the 
repair command
+  sparkSession.sessionState.catalog.listTables(dbName).foreach {
+tableIdent =>
+  triggerRepair(tableIdent.table, dbName, indexnameOp, segments)
+  }
+}
+Seq.empty
+  }
+
+  def triggerRepair(tableNameOp: String, databaseName: String,
+indexTableToRepair: Option[String], segments: 
Option[List[String]]): Unit = {
+val sparkSession = SparkSession.getActiveSession.get
+// when Si creation and load to main table are parallel, get the 
carbonTable from the
+// metastore which will have the latest index Info
+val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
+val mainCarbonTable = metaStore
+  .lookupRelation(Some(databaseName), tableNameOp)(sparkSession)
+  .asInstanceOf[CarbonRelation].carbonTable
+
+val carbonLoadModel = new CarbonLoadModel
+carbonLoadModel.setDatabaseName(databaseName)
+carbonLoadModel.setTableName(tableNameOp)
+carbonLoadModel.setTablePath(mainCarbonTable.getTablePath)
+val tableStatusFilePath = 
CarbonTablePath.getTableStatusFilePath(mainCarbonTable.getTablePath)
+carbonLoadModel.setLoadMetadataDetails(SegmentStatusManager
+  .readTableStatusFile(tableStatusFilePath).toList.asJava)
+carbonLoadModel.setCarbonDataLoadSchema(new 
CarbonDataLoadSchema(mainCarbonTable))
+
+val indexMetadata = mainCarbonTable.getIndexMetadata
+val secondaryIndexProvider = IndexType.SI.getIndexProviderName
+if (null != indexMetadata && null != indexMetadata.getIndexesMap &&
+  null != indexMetadata.getIndexesMap.get(secondaryIndexProvider)) {
+  val indexTables = indexMetadata.getIndexesMap
+.get(secondaryIndexProvider).keySet().asScala
+  // if there are no index tables for a given fact table do not perform 
any action
+  if (indexTables.nonEmpty) {
+val mainTableDetails = if (segments.isEmpty) {
+  carbonLoadModel.getLoadMetadataDetails.asScala.toList
+  // SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)
+} else {
+  // get segments for main table
+  

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478286002



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -17,25 +17,19 @@
 
 package org.apache.carbondata.sdk.file;
 
+import java.io.FileNotFoundException;
 import java.io.IOException;
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.HashMap;
-import java.util.HashSet;
-import java.util.List;
-import java.util.Map;
-import java.util.Objects;
-import java.util.Set;
-import java.util.TreeMap;
-import java.util.UUID;
+import java.util.*;

Review comment:
   wildcard import. same as above





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478286002



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -17,25 +17,19 @@
 
 package org.apache.carbondata.sdk.file;
 
+import java.io.FileNotFoundException;
 import java.io.IOException;
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.HashMap;
-import java.util.HashSet;
-import java.util.List;
-import java.util.Map;
-import java.util.Objects;
-import java.util.Set;
-import java.util.TreeMap;
-import java.util.UUID;
+import java.util.*;

Review comment:
   same as above





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478343705



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -660,13 +1113,42 @@ public CarbonWriter build() throws IOException, 
InvalidLoadOptionException {
   // removed from the load. LoadWithoutConverter flag is going to point to 
the Loader Builder
   // which will skip Conversion Step.
   loadModel.setLoadWithoutConverterStep(true);
-  return new AvroCarbonWriter(loadModel, hadoopConf, this.avroSchema);
+  AvroCarbonWriter avroCarbonWriter = new AvroCarbonWriter(loadModel,
+  hadoopConf, this.avroSchema);
+  if (!StringUtils.isEmpty(filePath)) {

Review comment:
   This condition never seem to fail ?? Same for json writer case below.
   We do not have similar check for parquet and orc types below





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478343705



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -660,13 +1113,42 @@ public CarbonWriter build() throws IOException, 
InvalidLoadOptionException {
   // removed from the load. LoadWithoutConverter flag is going to point to 
the Loader Builder
   // which will skip Conversion Step.
   loadModel.setLoadWithoutConverterStep(true);
-  return new AvroCarbonWriter(loadModel, hadoopConf, this.avroSchema);
+  AvroCarbonWriter avroCarbonWriter = new AvroCarbonWriter(loadModel,
+  hadoopConf, this.avroSchema);
+  if (!StringUtils.isEmpty(filePath)) {

Review comment:
   This condition never seem to fail ?? Same for json and csv writer case 
below.
   We do not have similar check for parquet and orc types below





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #3903: [CARBONDATA-3963]Fix hive timestamp data mismatch issue and empty data during query issue

2020-08-27 Thread GitBox


akashrn5 commented on pull request #3903:
URL: https://github.com/apache/carbondata/pull/3903#issuecomment-681846351


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905#issuecomment-681868472


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3888/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-27 Thread GitBox


vikramahuja1001 commented on a change in pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#discussion_r478329089



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##
@@ -377,4 +381,212 @@ object CarbonIndexUtil {
   AlterTableUtil.releaseLocks(locks.asScala.toList)
 }
   }
+
+  def processSIRepair(indexTableName: String, carbonTable: CarbonTable,
+carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata,
+  mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: 
String)
+  (sparkSession: SparkSession) : Unit = {
+val sparkSession = SparkSession.getActiveSession.get
+// val databaseName = sparkSession.catalog.currentDatabase
+// when Si creation and load to main table are parallel, get the 
carbonTable from the
+// metastore which will have the latest index Info
+val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
+val indexTable = metaStore
+  .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)(
+sparkSession)
+  .asInstanceOf[CarbonRelation]
+  .carbonTable
+
+val mainTblLoadMetadataDetails: Array[LoadMetadataDetails] =
+  SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)

Review comment:
   removed multiple readings, reading only once

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##
@@ -377,4 +381,212 @@ object CarbonIndexUtil {
   AlterTableUtil.releaseLocks(locks.asScala.toList)
 }
   }
+
+  def processSIRepair(indexTableName: String, carbonTable: CarbonTable,
+carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata,
+  mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: 
String)
+  (sparkSession: SparkSession) : Unit = {
+val sparkSession = SparkSession.getActiveSession.get
+// val databaseName = sparkSession.catalog.currentDatabase
+// when Si creation and load to main table are parallel, get the 
carbonTable from the
+// metastore which will have the latest index Info
+val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
+val indexTable = metaStore
+  .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)(
+sparkSession)
+  .asInstanceOf[CarbonRelation]
+  .carbonTable
+
+val mainTblLoadMetadataDetails: Array[LoadMetadataDetails] =
+  SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)
+val siTblLoadMetadataDetails: Array[LoadMetadataDetails] =
+  SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath)
+var segmentLocks: ListBuffer[ICarbonLock] = ListBuffer.empty
+if (!CarbonInternalLoaderUtil.checkMainTableSegEqualToSISeg(
+  mainTblLoadMetadataDetails,
+  siTblLoadMetadataDetails)) {
+  val indexColumns = indexMetadata.getIndexColumns(secondaryIndexProvider,
+indexTableName)
+  val secondaryIndex = IndexModel(Some(carbonTable.getDatabaseName),
+indexMetadata.getParentTableName,
+indexColumns.split(",").toList,
+indexTableName)
+
+  var details = 
SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath)

Review comment:
   removed, it was redundant code





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905#issuecomment-681874973


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2147/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on pull request #3893: [CARBONDATA-3959] Added new property to set the value of executor LRU cache size to 70% of the total executor memory in IndexServ

2020-08-27 Thread GitBox


vikramahuja1001 commented on pull request #3893:
URL: https://github.com/apache/carbondata/pull/3893#issuecomment-681886430


   Please make the necessary changes in the index-server documentation as well 
for the property changes



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 opened a new pull request #3905: [CARBONDATA-3964] Fixed null pointer excption for select * and select count(*) without filter.

2020-08-27 Thread GitBox


nihal0107 opened a new pull request #3905:
URL: https://github.com/apache/carbondata/pull/3905


### Why is this PR needed?
In case of 1 million record and 500 segments select query without filter is 
thowing null pointer exception.

### What changes were proposed in this PR?
   Select query without filter should execute pruneWithoutFilter method rather 
than pruneWithMultiThread. Added null check for filter.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3890: [CARBONDATA-3952] After reset query not hitting MV

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3890:
URL: https://github.com/apache/carbondata/pull/3890#issuecomment-681845603


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2144/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#issuecomment-681845433


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2145/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-27 Thread GitBox


vikramahuja1001 commented on a change in pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#discussion_r478314463



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/IndexRepairCommand.scala
##
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command.index
+
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.execution.command.DataCommand
+import org.apache.spark.sql.hive.CarbonRelation
+import org.apache.spark.sql.index.CarbonIndexUtil
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.metadata.index.IndexType
+import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, 
SegmentStatusManager}
+import org.apache.carbondata.core.util.path.CarbonTablePath
+import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, 
CarbonLoadModel}
+
+/**
+ * Repair logic for reindex command on maintable/indextable
+ */
+case class IndexRepairCommand(indexnameOp: Option[String], tableIdentifier: 
TableIdentifier,
+  dbName: String,
+  segments: Option[List[String]]) extends 
DataCommand {
+
+  private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName)
+
+  def processData(sparkSession: SparkSession): Seq[Row] = {
+if (dbName == null) {
+  // dbName is null, repair for index table or all the index table in main 
table
+  val databaseName = if (tableIdentifier.database.isEmpty) {
+SparkSession.getActiveSession.get.catalog.currentDatabase
+  } else {
+tableIdentifier.database.get
+  }
+  triggerRepair(tableIdentifier.table, databaseName, indexnameOp, segments)
+} else {
+  // repairing si for all  index tables in the mentioned database in the 
repair command
+  sparkSession.sessionState.catalog.listTables(dbName).foreach {
+tableIdent =>
+  triggerRepair(tableIdent.table, dbName, indexnameOp, segments)
+  }
+}
+Seq.empty
+  }
+
+  def triggerRepair(tableNameOp: String, databaseName: String,
+indexTableToRepair: Option[String], segments: 
Option[List[String]]): Unit = {
+val sparkSession = SparkSession.getActiveSession.get
+// when Si creation and load to main table are parallel, get the 
carbonTable from the
+// metastore which will have the latest index Info
+val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
+val mainCarbonTable = metaStore
+  .lookupRelation(Some(databaseName), tableNameOp)(sparkSession)
+  .asInstanceOf[CarbonRelation].carbonTable
+
+val carbonLoadModel = new CarbonLoadModel
+carbonLoadModel.setDatabaseName(databaseName)
+carbonLoadModel.setTableName(tableNameOp)
+carbonLoadModel.setTablePath(mainCarbonTable.getTablePath)
+val tableStatusFilePath = 
CarbonTablePath.getTableStatusFilePath(mainCarbonTable.getTablePath)
+carbonLoadModel.setLoadMetadataDetails(SegmentStatusManager
+  .readTableStatusFile(tableStatusFilePath).toList.asJava)
+carbonLoadModel.setCarbonDataLoadSchema(new 
CarbonDataLoadSchema(mainCarbonTable))
+
+val indexMetadata = mainCarbonTable.getIndexMetadata
+val secondaryIndexProvider = IndexType.SI.getIndexProviderName
+if (null != indexMetadata && null != indexMetadata.getIndexesMap &&
+  null != indexMetadata.getIndexesMap.get(secondaryIndexProvider)) {
+  val indexTables = indexMetadata.getIndexesMap
+.get(secondaryIndexProvider).keySet().asScala
+  // if there are no index tables for a given fact table do not perform 
any action
+  if (indexTables.nonEmpty) {
+val mainTableDetails = if (segments.isEmpty) {
+  carbonLoadModel.getLoadMetadataDetails.asScala.toList
+  // SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)
+} else {
+  // get segments for main table
+  

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-27 Thread GitBox


vikramahuja1001 commented on a change in pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#discussion_r478314130



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##
@@ -377,4 +381,212 @@ object CarbonIndexUtil {
   AlterTableUtil.releaseLocks(locks.asScala.toList)
 }
   }
+
+  def processSIRepair(indexTableName: String, carbonTable: CarbonTable,
+carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata,
+  mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: 
String)
+  (sparkSession: SparkSession) : Unit = {
+val sparkSession = SparkSession.getActiveSession.get
+// val databaseName = sparkSession.catalog.currentDatabase
+// when Si creation and load to main table are parallel, get the 
carbonTable from the
+// metastore which will have the latest index Info
+val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
+val indexTable = metaStore
+  .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)(
+sparkSession)
+  .asInstanceOf[CarbonRelation]
+  .carbonTable
+
+val mainTblLoadMetadataDetails: Array[LoadMetadataDetails] =
+  SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)
+val siTblLoadMetadataDetails: Array[LoadMetadataDetails] =
+  SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath)
+var segmentLocks: ListBuffer[ICarbonLock] = ListBuffer.empty
+if (!CarbonInternalLoaderUtil.checkMainTableSegEqualToSISeg(
+  mainTblLoadMetadataDetails,
+  siTblLoadMetadataDetails)) {
+  val indexColumns = indexMetadata.getIndexColumns(secondaryIndexProvider,
+indexTableName)
+  val secondaryIndex = IndexModel(Some(carbonTable.getDatabaseName),
+indexMetadata.getParentTableName,
+indexColumns.split(",").toList,
+indexTableName)
+
+  var details = 
SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath)
+  // If it empty, then no need to do further computations because the
+  // tabletstatus might not have been created and hence next load will 
take care
+  if (details.isEmpty) {
+Seq.empty
+  }
+
+  val failedLoadMetadataDetails: java.util.List[LoadMetadataDetails] = new 
util
+  .ArrayList[LoadMetadataDetails]()
+
+  // read the details of SI table and get all the failed segments during SI
+  // creation which are MARKED_FOR_DELETE or invalid INSERT_IN_PROGRESS
+  details.collect {

Review comment:
   done

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##
@@ -377,4 +381,212 @@ object CarbonIndexUtil {
   AlterTableUtil.releaseLocks(locks.asScala.toList)
 }
   }
+
+  def processSIRepair(indexTableName: String, carbonTable: CarbonTable,
+carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata,
+  mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: 
String)
+  (sparkSession: SparkSession) : Unit = {
+val sparkSession = SparkSession.getActiveSession.get
+// val databaseName = sparkSession.catalog.currentDatabase
+// when Si creation and load to main table are parallel, get the 
carbonTable from the
+// metastore which will have the latest index Info
+val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
+val indexTable = metaStore
+  .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)(
+sparkSession)
+  .asInstanceOf[CarbonRelation]
+  .carbonTable
+
+val mainTblLoadMetadataDetails: Array[LoadMetadataDetails] =
+  SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)
+val siTblLoadMetadataDetails: Array[LoadMetadataDetails] =
+  SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath)
+var segmentLocks: ListBuffer[ICarbonLock] = ListBuffer.empty
+if (!CarbonInternalLoaderUtil.checkMainTableSegEqualToSISeg(
+  mainTblLoadMetadataDetails,
+  siTblLoadMetadataDetails)) {
+  val indexColumns = indexMetadata.getIndexColumns(secondaryIndexProvider,
+indexTableName)
+  val secondaryIndex = IndexModel(Some(carbonTable.getDatabaseName),

Review comment:
   done

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##
@@ -377,4 +381,212 @@ object CarbonIndexUtil {
   AlterTableUtil.releaseLocks(locks.asScala.toList)
 }
   }
+
+  def processSIRepair(indexTableName: String, carbonTable: CarbonTable,
+carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata,
+  mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: 
String)
+  (sparkSession: SparkSession) : Unit = {
+val sparkSession = SparkSession.getActiveSession.get

Review comment:
   done





This is 

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478323052



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  private void validateCsvFiles() throws IOException {

Review comment:
   These validate methods(validate AvroFiles, ParquetFiles, CsvFiles, 
JsonFiles, OrcFiles) can be in the respective type of carbon writers. Because 
the way validate methods are implemented are specific to the respective format. 
Validate methods get readers/parse. Very similar code is in respective writers.
   
   Also can the validation method be called from CarbonWriterBuilder.build() 
method based respecitve writerType if dataFiles are not null ? I think, it can 
be abstract method in CarbonWriter to set input files to read, that is 
implemented by all the writers. Writers can validate and set them.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478323052



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  private void validateCsvFiles() throws IOException {

Review comment:
   These validate methods(validate AvroFiles, ParquetFiles, CsvFiles, 
JsonFiles, OrcFiles) can be in the respective type of carbon writers. Because 
the way validate methods are implemented are specific to the respective format. 
Validate methods get readers/parse. Very similar code is in respective writers.
   
   Also can the validation method be called from `CarbonWriterBuilder.build()` 
method based respecitve `writerType` if `dataFiles` are not null ? I think, it 
can be abstract method in `CarbonWriter` to set input files to read, that is 
implemented by all the writers. Writers can validate and set them.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478333555



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -660,13 +1113,42 @@ public CarbonWriter build() throws IOException, 
InvalidLoadOptionException {
   // removed from the load. LoadWithoutConverter flag is going to point to 
the Loader Builder
   // which will skip Conversion Step.
   loadModel.setLoadWithoutConverterStep(true);
-  return new AvroCarbonWriter(loadModel, hadoopConf, this.avroSchema);
+  AvroCarbonWriter avroCarbonWriter = new AvroCarbonWriter(loadModel,
+  hadoopConf, this.avroSchema);
+  if (!StringUtils.isEmpty(filePath)) {
+avroCarbonWriter.setDataFiles(this.dataFiles);
+  }
+  return avroCarbonWriter;
 } else if (this.writerType == WRITER_TYPE.JSON) {
   loadModel.setJsonFileLoad(true);
-  return new JsonCarbonWriter(loadModel, hadoopConf);
+  JsonCarbonWriter jsonCarbonWriter = new JsonCarbonWriter(loadModel, 
hadoopConf);
+  if (!StringUtils.isEmpty(filePath)) {
+jsonCarbonWriter.setDataFiles(this.dataFiles);
+  }
+  return jsonCarbonWriter;
+} else if (this.writerType == WRITER_TYPE.PARQUET) {
+  loadModel.setLoadWithoutConverterStep(true);
+  AvroCarbonWriter avroCarbonWriter = new AvroCarbonWriter(loadModel,
+  hadoopConf, this.avroSchema);
+  ParquetCarbonWriter parquetCarbonWriter = new
+  ParquetCarbonWriter(avroCarbonWriter, hadoopConf);

Review comment:
   Instead of creating instance of `AvroCarbonWriter` and passing it to 
`ParquetCarbonWriter`, create `AvroCarbonWriter` instance internally within the 
constructor of `ParquetCarbonWriter` Suggest same for `ORCCarbonWriter` below





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3890: [CARBONDATA-3952] After reset query not hitting MV

2020-08-27 Thread GitBox


ShreelekhyaG commented on a change in pull request #3890:
URL: https://github.com/apache/carbondata/pull/3890#discussion_r478240384



##
File path: 
integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/SetParameterTestCase.scala
##
@@ -252,6 +252,27 @@ class SetParameterTestCase extends QueryTest with 
BeforeAndAfterAll {
 sql("RESET")
   }
 
+  test("TC_014-test mv after reset properties") {
+sql("drop table if exists maintable")
+sql("drop MATERIALIZED VIEW if exists mv1")
+sql("CREATE TABLE maintable(empno int,empname string,projectcode int, 
projectjoindate " +
+"Timestamp, projectenddate date,salary double) STORED AS carbondata")
+sql("CREATE MATERIALIZED VIEW mv1 as select 
timeseries(projectenddate,'day'), sum" +
+"(projectcode) from maintable group by 
timeseries(projectenddate,'day')")
+sql("insert into maintable select 1000,'PURUJIT',00012,'2015-07-26 
12:07:28','2016-05-20'," +
+"15000.00")
+sql("insert into maintable select 1001,'PANKAJ',00010,'2015-07-26 
17:32:20','2016-05-20'," +
+"25000.00")
+sql("set carbon.input.segments.defualt.maintable=1")
+checkExistence(sql("EXPLAIN select timeseries(projectenddate,'day'), 
sum(projectcode) from " +

Review comment:
   ok added check with `verifyMVHit`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [CARBONDATA-3864] Store Size Optimization

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3789:
URL: https://github.com/apache/carbondata/pull/3789#issuecomment-681759193


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3884/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478250989



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java
##
@@ -823,6 +829,31 @@ public void write(Object object) throws IOException {
 }
   }
 
+  /**
+   * Load data of all avro files at given location iteratively.
+   *
+   * @throws IOException
+   */
+  @Override
+  public void write() throws IOException {
+if (this.dataFiles == null || this.dataFiles.length == 0) {
+  throw new RuntimeException("'withAvroPath()' must be called to support 
loading avro files");
+}
+Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath));
+for (CarbonFile dataFile : this.dataFiles) {
+  this.loadSingleFile(dataFile);
+}
+  }
+
+  private void loadSingleFile(CarbonFile file) throws IOException {
+DataFileStream avroReader = SDKUtil
+.buildAvroReader(file, this.configuration);

Review comment:
   `avroReader` stream is not closed neither in success case not in 
failure/exception cases. File InputStream used to create this DataFileStream in 
buildAvroReader is not closed. Check for all the stream reader/writer cases in 
this PR.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (CARBONDATA-3964) Select * from table or select count(*) without filter is throwing null pointer exception.

2020-08-27 Thread Nihal kumar ojha (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nihal kumar ojha updated CARBONDATA-3964:
-
Priority: Minor  (was: Major)

> Select * from table or select count(*) without filter is throwing null 
> pointer exception.
> -
>
> Key: CARBONDATA-3964
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3964
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Steps to reproduce.
> 1. Create a table.
> 2. Load around 500 segments and more than 1 million records.
> 3. Running query select(*) or select count(*) without filter is throwing null 
> pointer exception.
> File: TableIndex.java
> Method: pruneWithMultiThread
> line: 447
> Reason: filter.getresolver() is null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#issuecomment-681843843


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3886/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3964) Select * from table or select count(*) without filter is throwing null pointer exception.

2020-08-27 Thread Nihal kumar ojha (Jira)
Nihal kumar ojha created CARBONDATA-3964:


 Summary: Select * from table or select count(*) without filter is 
throwing null pointer exception.
 Key: CARBONDATA-3964
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3964
 Project: CarbonData
  Issue Type: Bug
Reporter: Nihal kumar ojha


Steps to reproduce.
1. Create a table.
2. Load around 500 segments and more than 1 million records.
3. Running query select(*) or select count(*) without filter is throwing null 
pointer exception.

File: TableIndex.java
Method: pruneWithMultiThread
line: 447
Reason: filter.getresolver() is null.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478195952



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java
##
@@ -823,6 +829,31 @@ public void write(Object object) throws IOException {
 }
   }
 
+  /**
+   * Load data of all avro files at given location iteratively.
+   *
+   * @throws IOException
+   */
+  @Override
+  public void write() throws IOException {
+if (this.dataFiles == null || this.dataFiles.length == 0) {
+  throw new RuntimeException("'withAvroPath()' must be called to support 
loading avro files");
+}
+Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath));

Review comment:
   Is this sort required ? Same is applicable for other writers too.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478284566



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CSVCarbonWriter.java
##
@@ -72,6 +93,36 @@ public void write(Object object) throws IOException {
 }
   }
 
+  /**
+   * Load data of all or selected csv files at given location iteratively.
+   *
+   * @throws IOException
+   */
+  @Override
+  public void write() throws IOException {
+if (this.dataFiles == null || this.dataFiles.length == 0) {
+  throw new RuntimeException("'withCsvPath()' must be called to support 
load files");
+}
+this.csvParser = SDKUtil.buildCsvParser(this.configuration);
+Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath));
+for (CarbonFile dataFile : this.dataFiles) {
+  this.loadSingleFile(dataFile);
+}
+  }
+
+  private void loadSingleFile(CarbonFile file) throws IOException {
+this.csvParser.beginParsing(FileFactory.getDataInputStream(file.getPath(), 
-1, configuration));

Review comment:
   InputStream is not closed. same as above.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3890: [CARBONDATA-3952] After reset query not hitting MV

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3890:
URL: https://github.com/apache/carbondata/pull/3890#issuecomment-681842224


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3885/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#issuecomment-681894212


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3889/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478375043



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  private void validateCsvFiles() throws IOException {
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION);
+if (CollectionUtils.isEmpty(Arrays.asList(dataFiles))) {
+  throw new RuntimeException("CSV files can't be empty.");
+}
+for (CarbonFile dataFile : dataFiles) {
+  try {
+CsvParser csvParser = SDKUtil.buildCsvParser(this.hadoopConf);
+
csvParser.beginParsing(FileFactory.getDataInputStream(dataFile.getPath(),
+-1, this.hadoopConf));
+  } catch (IllegalArgumentException ex) {
+if (ex.getCause() instanceof FileNotFoundException) {
+  throw new FileNotFoundException("File " + dataFile +
+  " not found to build carbon writer.");
+}
+throw ex;
+  }
+}
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading CSV files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withCsvInput();
+this.validateCsvFiles();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts CSV files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the CSV file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withCsvPath(filePath);
+return this;
+  }
+
+  private void validateJsonFiles() throws IOException {
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION);
+for (CarbonFile dataFile : dataFiles) {
+  try {
+new JSONParser().parse(SDKUtil.buildJsonReader(dataFile, 
this.hadoopConf));
+  } catch (FileNotFoundException ex) {
+throw new FileNotFoundException("File " + dataFile + " not found to 
build carbon writer.");
+  } catch (ParseException ex) {
+throw new RuntimeException("File " + dataFile + " is not in json 
format.");
+  }
+}
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading JSON files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withJsonInput();
+this.validateJsonFiles();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts JSON file directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the json file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withJsonPath(filePath);
+return this;
+  }
+
+  private void validateFilePath(String filePath) {
+if (StringUtils.isEmpty(filePath)) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading Parquet files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath) throws 
IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.writerType = WRITER_TYPE.PARQUET;
+this.validateParquetFiles();
+return this;
+  }
+
+  private void setIsDirectory(String filePath) {
+if (this.hadoopConf == null) {
+  this.hadoopConf = new Configuration(FileFactory.getConfiguration());
+}
+CarbonFile carbonFile = FileFactory.getCarbonFile(filePath, hadoopConf);
+this.isDirectory = carbonFile.isDirectory();
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts parquet files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the parquet file exists.
+   * @param fileList list of files which has to be 

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#issuecomment-681969792


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2151/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-681987195


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2152/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478378839



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  private void validateCsvFiles() throws IOException {
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION);
+if (CollectionUtils.isEmpty(Arrays.asList(dataFiles))) {
+  throw new RuntimeException("CSV files can't be empty.");
+}
+for (CarbonFile dataFile : dataFiles) {
+  try {
+CsvParser csvParser = SDKUtil.buildCsvParser(this.hadoopConf);
+
csvParser.beginParsing(FileFactory.getDataInputStream(dataFile.getPath(),
+-1, this.hadoopConf));
+  } catch (IllegalArgumentException ex) {
+if (ex.getCause() instanceof FileNotFoundException) {
+  throw new FileNotFoundException("File " + dataFile +
+  " not found to build carbon writer.");
+}
+throw ex;
+  }
+}
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading CSV files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withCsvInput();
+this.validateCsvFiles();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts CSV files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the CSV file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withCsvPath(filePath);
+return this;
+  }
+
+  private void validateJsonFiles() throws IOException {
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION);
+for (CarbonFile dataFile : dataFiles) {
+  try {
+new JSONParser().parse(SDKUtil.buildJsonReader(dataFile, 
this.hadoopConf));
+  } catch (FileNotFoundException ex) {
+throw new FileNotFoundException("File " + dataFile + " not found to 
build carbon writer.");
+  } catch (ParseException ex) {
+throw new RuntimeException("File " + dataFile + " is not in json 
format.");
+  }
+}
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading JSON files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withJsonInput();
+this.validateJsonFiles();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts JSON file directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the json file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withJsonPath(filePath);
+return this;
+  }
+
+  private void validateFilePath(String filePath) {
+if (StringUtils.isEmpty(filePath)) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading Parquet files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath) throws 
IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.writerType = WRITER_TYPE.PARQUET;
+this.validateParquetFiles();
+return this;
+  }
+
+  private void setIsDirectory(String filePath) {
+if (this.hadoopConf == null) {
+  this.hadoopConf = new Configuration(FileFactory.getConfiguration());
+}
+CarbonFile carbonFile = FileFactory.getCarbonFile(filePath, hadoopConf);
+this.isDirectory = carbonFile.isDirectory();
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts parquet files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the parquet file exists.
+   * @param fileList list of files which has to be 

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478395584



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/JsonCarbonWriter.java
##
@@ -91,4 +106,44 @@ public void close() throws IOException {
   throw new IOException(e);
 }
   }
+
+  private void loadSingleFile(CarbonFile file) throws IOException {
+Reader reader = null;
+try {
+  reader = SDKUtil.buildJsonReader(file, configuration);
+  JSONParser jsonParser = new JSONParser();
+  Object jsonRecord = jsonParser.parse(reader);
+  if (jsonRecord instanceof JSONArray) {
+JSONArray jsonArray = (JSONArray) jsonRecord;
+for (Object record : jsonArray) {
+  this.write(record.toString());
+}
+  } else {
+this.write(jsonRecord.toString());
+  }
+} catch (Exception e) {

Review comment:
   Good to use specific exceptions wherever possible instead of generic 
exception





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#issuecomment-681940790


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3891/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-27 Thread GitBox


ajantha-bhat commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r478367442



##
File path: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java
##
@@ -376,6 +456,56 @@ private void fillVector(byte[] pageData, 
CarbonColumnVector vector, DataType vec
   DecimalConverterFactory.DecimalConverter decimalConverter = 
vectorInfo.decimalConverter;
   decimalConverter.fillVector(pageData, pageSize, vectorInfo, 
nullBits, pageDataType);
 }
+  } else if (pageDataType == DataTypes.BYTE_ARRAY) {
+if (vectorDataType == DataTypes.STRING || vectorDataType == 
DataTypes.BINARY
+|| vectorDataType == DataTypes.VARCHAR) {
+  // for complex primitive string, binary, varchar type
+  int offset = 0;
+  for (int i = 0; i < pageSize; i++) {
+byte[] stringLen = new byte[DataTypes.INT.getSizeInBytes()];

Review comment:
   done

##
File path: 
core/src/main/java/org/apache/carbondata/core/scan/executor/impl/AbstractQueryExecutor.java
##
@@ -98,6 +98,9 @@
*/
   protected CarbonIterator queryIterator;
 
+  // Size of the ReusableDataBuffer based on the number of dimension 
projection columns
+  protected int reusableDimensionBufferSize = 0;

Review comment:
   done

##
File path: 
core/src/main/java/org/apache/carbondata/core/scan/result/BlockletScannedResult.java
##
@@ -153,6 +154,9 @@
 
   private ReusableDataBuffer[] measureReusableBuffer;
 
+  // index used by dimensionReusableBuffer
+  int dimensionReusableBufferIndex = 0;

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478395584



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/JsonCarbonWriter.java
##
@@ -91,4 +106,44 @@ public void close() throws IOException {
   throw new IOException(e);
 }
   }
+
+  private void loadSingleFile(CarbonFile file) throws IOException {
+Reader reader = null;
+try {
+  reader = SDKUtil.buildJsonReader(file, configuration);
+  JSONParser jsonParser = new JSONParser();
+  Object jsonRecord = jsonParser.parse(reader);
+  if (jsonRecord instanceof JSONArray) {
+JSONArray jsonArray = (JSONArray) jsonRecord;
+for (Object record : jsonArray) {
+  this.write(record.toString());
+}
+  } else {
+this.write(jsonRecord.toString());
+  }
+} catch (Exception e) {

Review comment:
   Use specific exceptions instead of generic exception





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#issuecomment-681891693


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2148/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478354030



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  private void validateCsvFiles() throws IOException {
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION);
+if (CollectionUtils.isEmpty(Arrays.asList(dataFiles))) {
+  throw new RuntimeException("CSV files can't be empty.");
+}
+for (CarbonFile dataFile : dataFiles) {
+  try {
+CsvParser csvParser = SDKUtil.buildCsvParser(this.hadoopConf);
+
csvParser.beginParsing(FileFactory.getDataInputStream(dataFile.getPath(),
+-1, this.hadoopConf));
+  } catch (IllegalArgumentException ex) {
+if (ex.getCause() instanceof FileNotFoundException) {
+  throw new FileNotFoundException("File " + dataFile +
+  " not found to build carbon writer.");
+}
+throw ex;
+  }
+}
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading CSV files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withCsvInput();
+this.validateCsvFiles();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts CSV files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the CSV file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withCsvPath(filePath);
+return this;
+  }
+
+  private void validateJsonFiles() throws IOException {
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION);
+for (CarbonFile dataFile : dataFiles) {
+  try {
+new JSONParser().parse(SDKUtil.buildJsonReader(dataFile, 
this.hadoopConf));
+  } catch (FileNotFoundException ex) {
+throw new FileNotFoundException("File " + dataFile + " not found to 
build carbon writer.");
+  } catch (ParseException ex) {
+throw new RuntimeException("File " + dataFile + " is not in json 
format.");
+  }
+}
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading JSON files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withJsonInput();
+this.validateJsonFiles();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts JSON file directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the json file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withJsonPath(filePath);
+return this;
+  }
+
+  private void validateFilePath(String filePath) {
+if (StringUtils.isEmpty(filePath)) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading Parquet files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath) throws 
IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.writerType = WRITER_TYPE.PARQUET;
+this.validateParquetFiles();
+return this;
+  }
+
+  private void setIsDirectory(String filePath) {
+if (this.hadoopConf == null) {
+  this.hadoopConf = new Configuration(FileFactory.getConfiguration());
+}
+CarbonFile carbonFile = FileFactory.getCarbonFile(filePath, hadoopConf);
+this.isDirectory = carbonFile.isDirectory();
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts parquet files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the parquet file exists.
+   * @param fileList list of files which has to be 

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3903: [CARBONDATA-3963]Fix hive timestamp data mismatch issue and empty data during query issue

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3903:
URL: https://github.com/apache/carbondata/pull/3903#issuecomment-681900533


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2149/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-27 Thread GitBox


ajantha-bhat commented on pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#issuecomment-681914750


   > So for complex type user must configure table page size to get the better 
query performance while creating the table.
   
   @kumarvishal09: I too strongly agree that if user using complex type with 
huge data in one row, page size is mandatory to use. 
   If I make it default now, some testcase validation will fail (example CLI 
test case which are checking page size), I will handle in another PR. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#issuecomment-681965099


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3892/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] vikramahuja1001 commented on pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-27 Thread GitBox


vikramahuja1001 commented on pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#issuecomment-681994461


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-27 Thread GitBox


ajantha-bhat commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r478374714



##
File path: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveCodec.java
##
@@ -260,4 +265,65 @@ protected String debugInfo() {
 return this.toString();
   }
 
+  public static VectorUtil checkAndUpdateToChildVector(ColumnVectorInfo 
vectorInfo, int pageSize,
+  CarbonColumnVector vector, DataType vectorDataType) {
+VectorUtil vectorUtil = new VectorUtil(pageSize, vector, vectorDataType);
+Stack vectorStack = vectorInfo.getVectorStack();
+// check and update to child vector info
+if (vectorStack != null && vectorStack.peek() != null && 
vectorDataType.isComplexType()) {

Review comment:
   Objects.isnull and Objects.nonNull is internally just doing the same. 
   For just "if check" it is not much useful (We have not used in our code for 
if checks also anywhere) 
   **I will use it in the future if I use streams.**
   
   ![Screenshot from 2020-08-27 
17-43-51](https://user-images.githubusercontent.com/5889404/91441253-7d649d00-e88d-11ea-8d30-aa73af1cd7d6.png)
   
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#issuecomment-681942622


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2150/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-681978515


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3893/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3903: [CARBONDATA-3963]Fix hive timestamp data mismatch issue and empty data during query issue

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3903:
URL: https://github.com/apache/carbondata/pull/3903#issuecomment-681898190


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3890/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478387596



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/JsonCarbonWriter.java
##
@@ -91,4 +102,39 @@ public void close() throws IOException {
   throw new IOException(e);
 }
   }
+
+  private void loadSingleFile(File file) throws IOException {
+try {
+  Reader reader = SDKUtil.buildJsonReader(file);
+  JSONParser jsonParser = new JSONParser();
+  Object jsonRecord = jsonParser.parse(reader);
+  if (jsonRecord instanceof JSONArray) {
+JSONArray jsonArray = (JSONArray) jsonRecord;
+for (Object record : jsonArray) {
+  this.write(record.toString());
+}
+  } else {
+this.write(jsonRecord.toString());
+  }
+} catch (Exception e) {
+  e.printStackTrace();
+  throw new IOException(e.getMessage());

Review comment:
   Need to close stream in validateJsonFiles as well.  Please check for all 
the stream cases in this PR.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478408315



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ORCCarbonWriter.java
##
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.file;
+
+import java.io.IOException;
+import java.util.*;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.sdk.file.utils.SDKUtil;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.ql.io.orc.OrcStruct;
+import org.apache.hadoop.hive.ql.io.orc.Reader;
+import org.apache.hadoop.hive.ql.io.orc.RecordReader;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.io.Text;
+
+
+/**
+ * Implementation to write ORC rows in CSV format to carbondata file.
+ */
+public class ORCCarbonWriter extends CSVCarbonWriter {
+  private Configuration configuration;
+  private CSVCarbonWriter csvCarbonWriter = null;
+  private Reader orcReader = null;
+  private CarbonFile[] dataFiles;
+
+  ORCCarbonWriter(CSVCarbonWriter csvCarbonWriter, Configuration 
configuration) {
+this.csvCarbonWriter = csvCarbonWriter;
+this.configuration = configuration;
+  }
+
+  @Override
+  public void setDataFiles(CarbonFile[] dataFiles) {
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * Load ORC file in iterative way.
+   */
+  @Override
+  public void write() throws IOException {
+if (this.dataFiles == null || this.dataFiles.length == 0) {
+  throw new RuntimeException("'withOrcPath()' must be called to support 
loading ORC files");
+}
+if (this.csvCarbonWriter == null) {

Review comment:
   We shouldn't have created the writer instance at first place if 
`csvCarbonWriter` was null. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-27 Thread GitBox


VenuReddy2103 commented on pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#issuecomment-681588563


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r478153107



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##
@@ -306,7 +315,51 @@ class TestLoadDataWithDiffTimestampFormat extends 
QueryTest with BeforeAndAfterA
 }
   }
 
+  test("test load, update data with setlenient carbon property for daylight " +
+   "saving time from different timezone") {
+
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE,
 "true")
+TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+sql("DROP TABLE IF EXISTS test_time")
+sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time 
Timestamp) STORED AS carbondata " +
+"TBLPROPERTIES('dateformat'='-MM-dd', 
'timestampformat'='-MM-dd HH:mm:ss') ")
+sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time")
+sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+sql("DROP TABLE test_time")
+
CarbonProperties.getInstance().removeProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE)
+  }
+
+  test("test load, update data with setlenient session level property for 
daylight " +
+   "saving time from different timezone") {
+sql("set carbon.load.dateformat.setlenient.enable = true")
+TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+sql("DROP TABLE IF EXISTS test_time")
+sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time 
Timestamp) STORED AS carbondata " +
+"TBLPROPERTIES('dateformat'='-MM-dd', 
'timestampformat'='-MM-dd HH:mm:ss') ")
+sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time")
+sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+sql("DROP TABLE test_time")
+defaultConfig()
+  }
+
+  def generateCSVFile(): Unit = {
+val rows = new ListBuffer[Array[String]]
+rows += Array("ID", "date", "time")
+rows += Array("1", "1941-3-15", "1941-3-15 00:00:00")
+rows += Array("2", "2016-7-24", "2016-7-24 01:02:30")
+BadRecordUtil.createCSV(rows, csvPath)
+  }
+
   override def afterAll {
 sql("DROP TABLE IF EXISTS t3")
+FileUtils.forceDelete(new File(csvPath))
+TimeZone.setDefault(defaultTimeZone)

Review comment:
   `afterAll()` is called only once per testcase file. But, have to set  
back`TimeZone.setDefault(defaultTimeZone)` at the end of the testcase  where 
you changed  zone to China/Shanghai.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 closed pull request #3900: [WIP] Test

2020-08-27 Thread GitBox


Indhumathi27 closed pull request #3900:
URL: https://github.com/apache/carbondata/pull/3900


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-3955) Fix load failures due to daylight saving time changes

2020-08-27 Thread Akash R Nilugal (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash R Nilugal resolved CARBONDATA-3955.
-
Fix Version/s: 2.1.0
   Resolution: Fixed

> Fix load failures due to daylight saving time changes
> -
>
> Key: CARBONDATA-3955
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3955
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Major
> Fix For: 2.1.0
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> 1) Fix load failures due to daylight saving time changes.
> 2) During load, date/timestamp year values with >4 digit should fail or be 
> null according to bad records action property.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-681629618


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2142/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478194800



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java
##
@@ -25,17 +25,12 @@
 import java.math.BigDecimal;
 import java.math.BigInteger;
 import java.nio.ByteBuffer;
-import java.util.ArrayList;
-import java.util.HashMap;
-import java.util.Iterator;
-import java.util.List;
-import java.util.Map;
-import java.util.Random;
-import java.util.UUID;
+import java.util.*;

Review comment:
   Please do not use wildcard import.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478195952



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java
##
@@ -823,6 +829,31 @@ public void write(Object object) throws IOException {
 }
   }
 
+  /**
+   * Load data of all avro files at given location iteratively.
+   *
+   * @throws IOException
+   */
+  @Override
+  public void write() throws IOException {
+if (this.dataFiles == null || this.dataFiles.length == 0) {
+  throw new RuntimeException("'withAvroPath()' must be called to support 
loading avro files");
+}
+Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath));

Review comment:
   Is this sort required ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] asfgit closed pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-27 Thread GitBox


asfgit closed pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r472407770



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  private void validateCsvFiles() throws IOException {
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION);
+if (CollectionUtils.isEmpty(Arrays.asList(dataFiles))) {
+  throw new RuntimeException("CSV files can't be empty.");
+}
+for (CarbonFile dataFile : dataFiles) {
+  try {
+CsvParser csvParser = SDKUtil.buildCsvParser(this.hadoopConf);
+
csvParser.beginParsing(FileFactory.getDataInputStream(dataFile.getPath(),
+-1, this.hadoopConf));
+  } catch (IllegalArgumentException ex) {
+if (ex.getCause() instanceof FileNotFoundException) {
+  throw new FileNotFoundException("File " + dataFile +
+  " not found to build carbon writer.");
+}
+throw ex;
+  }
+}
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading CSV files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withCsvInput();
+this.validateCsvFiles();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts CSV files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the CSV file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withCsvPath(filePath);
+return this;
+  }
+
+  private void validateJsonFiles() throws IOException {
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION);
+for (CarbonFile dataFile : dataFiles) {
+  try {
+new JSONParser().parse(SDKUtil.buildJsonReader(dataFile, 
this.hadoopConf));
+  } catch (FileNotFoundException ex) {
+throw new FileNotFoundException("File " + dataFile + " not found to 
build carbon writer.");
+  } catch (ParseException ex) {
+throw new RuntimeException("File " + dataFile + " is not in json 
format.");
+  }
+}
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading JSON files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withJsonInput();
+this.validateJsonFiles();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts JSON file directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the json file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withJsonPath(filePath);
+return this;
+  }
+
+  private void validateFilePath(String filePath) {
+if (StringUtils.isEmpty(filePath)) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading Parquet files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath) throws 
IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.writerType = WRITER_TYPE.PARQUET;
+this.validateParquetFiles();
+return this;
+  }
+
+  private void setIsDirectory(String filePath) {
+if (this.hadoopConf == null) {
+  this.hadoopConf = new Configuration(FileFactory.getConfiguration());

Review comment:
   Had checked the base code. In the base code, we seem to directly assign 
the return value of FileFactory.getConfiguration() instead of new 
Configuration. Suggest to check and keep it consistent. Check for all the 
places in this PR.





This is an automated message from the Apache Git Service.
To respond 

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-681626492


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3883/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478227905



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/utils/SDKUtil.java
##
@@ -79,4 +98,75 @@ public static ArrayList listFiles(String sourceImageFolder,
 return (Object[]) input[i];
   }
 
+  public static List extractFilesFromFolder(String path,
+  String suf, Configuration hadoopConf) {
+List dataFiles = listFiles(path, suf, hadoopConf);
+List carbonFiles = new ArrayList<>();
+for (Object dataFile: dataFiles) {
+  carbonFiles.add(FileFactory.getCarbonFile(dataFile.toString(), 
hadoopConf));
+}
+if (CollectionUtils.isEmpty(dataFiles)) {
+  throw new RuntimeException("No file found at given location. Please 
provide" +
+  "the correct folder location.");
+}
+return carbonFiles;
+  }
+
+  public static DataFileStream buildAvroReader(CarbonFile 
carbonFile,
+   Configuration configuration) throws IOException {
+try {
+  GenericDatumReader genericDatumReader =
+  new GenericDatumReader<>();
+  DataFileStream avroReader =
+  new 
DataFileStream<>(FileFactory.getDataInputStream(carbonFile.getPath(),
+  -1, configuration), genericDatumReader);
+  return avroReader;
+} catch (FileNotFoundException ex) {
+  throw new FileNotFoundException("File " + carbonFile.getPath()
+  + " not found to build carbon writer.");
+} catch (IOException ex) {
+  if (ex.getMessage().contains("Not a data file")) {

Review comment:
   Why catch `IOException` and rethrow as `RuntimeException` here ? You 
converted checked exception to uncheked exception and consumed original 
exception completely. Better to preserve the orignal exception as there is no 
action after catching it except for the different exception message?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-27 Thread GitBox


ajantha-bhat commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r478227589



##
File path: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveCodec.java
##
@@ -260,4 +265,65 @@ protected String debugInfo() {
 return this.toString();
   }
 
+  public static VectorUtil checkAndUpdateToChildVector(ColumnVectorInfo 
vectorInfo, int pageSize,
+  CarbonColumnVector vector, DataType vectorDataType) {
+VectorUtil vectorUtil = new VectorUtil(pageSize, vector, vectorDataType);
+Stack vectorStack = vectorInfo.getVectorStack();
+// check and update to child vector info
+if (vectorStack != null && vectorStack.peek() != null && 
vectorDataType.isComplexType()) {
+  if (DataTypes.isArrayType(vectorDataType)) {
+List childElementsCountForEachRow =
+((CarbonColumnVectorImpl) vector.getColumnVector())
+.getNumberOfChildrenElementsInEachRow();
+int newPageSize = 0;
+for (int childElementsCount : childElementsCountForEachRow) {
+  newPageSize += childElementsCount;
+}
+vectorUtil.setPageSize(newPageSize);
+  }
+  // child vector flow, so fill the child vector
+  CarbonColumnVector childVector = vectorStack.pop();
+  vectorUtil.setVector(childVector);
+  vectorUtil.setVectorDataType(childVector.getType());
+}
+return vectorUtil;
+  }
+
+  // Utility class to update current vector to child vector in case of complex 
type handling
+  public static class VectorUtil {
+private int pageSize;
+private CarbonColumnVector vector;
+private DataType vectorDataType;
+
+private VectorUtil(int pageSize, CarbonColumnVector vector, DataType 
vectorDataType) {
+  this.pageSize = pageSize;
+  this.vector = vector;
+  this.vectorDataType = vectorDataType;
+}
+
+public int getPageSize() {

Review comment:
   removed this whole class itself and updating vector inside 
**ColumnarVectorWrapperDirectFactory
 .getDirectVectorWrapperFactory**

##
File path: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveCodec.java
##
@@ -260,4 +265,52 @@ protected String debugInfo() {
 return this.toString();
   }
 
+  // Utility class to update current vector to child vector in case of complex 
type handling
+  public static class VectorUtil {

Review comment:
   removed this whole class itself and updating vector inside 
**ColumnarVectorWrapperDirectFactory
 .getDirectVectorWrapperFactory**





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-27 Thread GitBox


ajantha-bhat commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r478227936



##
File path: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveCodec.java
##
@@ -260,4 +265,52 @@ protected String debugInfo() {
 return this.toString();
   }
 
+  // Utility class to update current vector to child vector in case of complex 
type handling
+  public static class VectorUtil {
+private ColumnVectorInfo vectorInfo;
+private int pageSize;
+private CarbonColumnVector vector;
+private DataType vectorDataType;
+
+public VectorUtil(ColumnVectorInfo vectorInfo, int pageSize, 
CarbonColumnVector vector,
+DataType vectorDataType) {
+  this.vectorInfo = vectorInfo;
+  this.pageSize = pageSize;
+  this.vector = vector;
+  this.vectorDataType = vectorDataType;
+}
+
+public int getPageSize() {
+  return pageSize;
+}
+
+public CarbonColumnVector getVector() {
+  return vector;
+}
+
+public DataType getVectorDataType() {
+  return vectorDataType;
+}
+
+public VectorUtil checkAndUpdateToChildVector() {

Review comment:
   removed this whole class itself and updating vector inside 
**ColumnarVectorWrapperDirectFactory
 .getDirectVectorWrapperFactory**





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [CARBONDATA-3864] Store Size Optimization

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3789:
URL: https://github.com/apache/carbondata/pull/3789#issuecomment-681757236


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2143/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478495670



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ORCCarbonWriter.java
##
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.file;
+
+import java.io.IOException;
+import java.util.*;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.sdk.file.utils.SDKUtil;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.ql.io.orc.OrcStruct;
+import org.apache.hadoop.hive.ql.io.orc.Reader;
+import org.apache.hadoop.hive.ql.io.orc.RecordReader;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.io.Text;
+
+
+/**
+ * Implementation to write ORC rows in CSV format to carbondata file.
+ */
+public class ORCCarbonWriter extends CSVCarbonWriter {
+  private Configuration configuration;
+  private CSVCarbonWriter csvCarbonWriter = null;
+  private Reader orcReader = null;
+  private CarbonFile[] dataFiles;
+
+  ORCCarbonWriter(CSVCarbonWriter csvCarbonWriter, Configuration 
configuration) {
+this.csvCarbonWriter = csvCarbonWriter;
+this.configuration = configuration;
+  }
+
+  @Override
+  public void setDataFiles(CarbonFile[] dataFiles) {
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * Load ORC file in iterative way.
+   */
+  @Override
+  public void write() throws IOException {
+if (this.dataFiles == null || this.dataFiles.length == 0) {
+  throw new RuntimeException("'withOrcPath()' must be called to support 
loading ORC files");
+}
+if (this.csvCarbonWriter == null) {
+  throw new RuntimeException("csv carbon writer can not be null");
+}
+Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath));
+for (CarbonFile dataFile : this.dataFiles) {
+  this.loadSingleFile(dataFile);
+}
+  }
+
+  private void loadSingleFile(CarbonFile file) throws IOException {
+orcReader = SDKUtil.buildOrcReader(file.getPath(), this.configuration);
+ObjectInspector objectInspector = orcReader.getObjectInspector();
+RecordReader recordReader = orcReader.rows();
+if (objectInspector instanceof StructObjectInspector) {
+  StructObjectInspector structObjectInspector =
+  (StructObjectInspector) orcReader.getObjectInspector();
+  while (recordReader.hasNext()) {
+Object record = recordReader.next(null); // to remove duplicacy.

Review comment:
   was looking at API documentation and references from internet. Info in 
the documentation is limited though. `recordReader.next()` takes previous 
record as arg. But we are always passing null ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478545280



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ORCCarbonWriter.java
##
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.file;
+
+import java.io.IOException;
+import java.util.*;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.sdk.file.utils.SDKUtil;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.ql.io.orc.OrcStruct;
+import org.apache.hadoop.hive.ql.io.orc.Reader;
+import org.apache.hadoop.hive.ql.io.orc.RecordReader;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.io.Text;
+
+
+/**
+ * Implementation to write ORC rows in CSV format to carbondata file.
+ */
+public class ORCCarbonWriter extends CSVCarbonWriter {
+  private Configuration configuration;
+  private CSVCarbonWriter csvCarbonWriter = null;
+  private Reader orcReader = null;
+  private CarbonFile[] dataFiles;
+
+  ORCCarbonWriter(CSVCarbonWriter csvCarbonWriter, Configuration 
configuration) {
+this.csvCarbonWriter = csvCarbonWriter;
+this.configuration = configuration;
+  }
+
+  @Override
+  public void setDataFiles(CarbonFile[] dataFiles) {
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * Load ORC file in iterative way.
+   */
+  @Override
+  public void write() throws IOException {
+if (this.dataFiles == null || this.dataFiles.length == 0) {
+  throw new RuntimeException("'withOrcPath()' must be called to support 
loading ORC files");
+}
+if (this.csvCarbonWriter == null) {
+  throw new RuntimeException("csv carbon writer can not be null");
+}
+Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath));
+for (CarbonFile dataFile : this.dataFiles) {
+  this.loadSingleFile(dataFile);
+}
+  }
+
+  private void loadSingleFile(CarbonFile file) throws IOException {
+orcReader = SDKUtil.buildOrcReader(file.getPath(), this.configuration);
+ObjectInspector objectInspector = orcReader.getObjectInspector();
+RecordReader recordReader = orcReader.rows();
+if (objectInspector instanceof StructObjectInspector) {
+  StructObjectInspector structObjectInspector =
+  (StructObjectInspector) orcReader.getObjectInspector();
+  while (recordReader.hasNext()) {
+Object record = recordReader.next(null); // to remove duplicacy.
+List valueList = 
structObjectInspector.getStructFieldsDataAsList(record);
+for (int i = 0; i < valueList.size(); i++) {
+  valueList.set(i, parseOrcObject(valueList.get(i), 0));
+}
+this.csvCarbonWriter.write(valueList.toArray());
+  }
+} else {
+  while (recordReader.hasNext()) {

Review comment:
   Curious to know when does this else case hit ? You testcase do not seem 
to cover it





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] brijoobopanna commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

2020-08-27 Thread GitBox


brijoobopanna commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-682087451


   retest this please
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-27 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478495670



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/ORCCarbonWriter.java
##
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.sdk.file;
+
+import java.io.IOException;
+import java.util.*;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile;
+import org.apache.carbondata.sdk.file.utils.SDKUtil;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.ql.io.orc.OrcStruct;
+import org.apache.hadoop.hive.ql.io.orc.Reader;
+import org.apache.hadoop.hive.ql.io.orc.RecordReader;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import org.apache.hadoop.io.Text;
+
+
+/**
+ * Implementation to write ORC rows in CSV format to carbondata file.
+ */
+public class ORCCarbonWriter extends CSVCarbonWriter {
+  private Configuration configuration;
+  private CSVCarbonWriter csvCarbonWriter = null;
+  private Reader orcReader = null;
+  private CarbonFile[] dataFiles;
+
+  ORCCarbonWriter(CSVCarbonWriter csvCarbonWriter, Configuration 
configuration) {
+this.csvCarbonWriter = csvCarbonWriter;
+this.configuration = configuration;
+  }
+
+  @Override
+  public void setDataFiles(CarbonFile[] dataFiles) {
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * Load ORC file in iterative way.
+   */
+  @Override
+  public void write() throws IOException {
+if (this.dataFiles == null || this.dataFiles.length == 0) {
+  throw new RuntimeException("'withOrcPath()' must be called to support 
loading ORC files");
+}
+if (this.csvCarbonWriter == null) {
+  throw new RuntimeException("csv carbon writer can not be null");
+}
+Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath));
+for (CarbonFile dataFile : this.dataFiles) {
+  this.loadSingleFile(dataFile);
+}
+  }
+
+  private void loadSingleFile(CarbonFile file) throws IOException {
+orcReader = SDKUtil.buildOrcReader(file.getPath(), this.configuration);
+ObjectInspector objectInspector = orcReader.getObjectInspector();
+RecordReader recordReader = orcReader.rows();
+if (objectInspector instanceof StructObjectInspector) {
+  StructObjectInspector structObjectInspector =
+  (StructObjectInspector) orcReader.getObjectInspector();
+  while (recordReader.hasNext()) {
+Object record = recordReader.next(null); // to remove duplicacy.

Review comment:
   was looking at API documentation and references from internet. Info in 
the documentation is limited though. `recordReader.next()` takes previous 
record as arg. But we are always passing null ? Same applies to below else case 
too





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#issuecomment-682062796


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3894/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#issuecomment-682063656


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2153/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-682145902


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3896/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-682105315


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3895/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-682106892


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2154/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-682146723


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2155/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3906: [WIP] Added test cases for hive read complex types and handled other issues

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3906:
URL: https://github.com/apache/carbondata/pull/3906#issuecomment-682231829


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2157/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3906: [WIP] Added test cases for hive read complex types and handled other issues

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3906:
URL: https://github.com/apache/carbondata/pull/3906#issuecomment-682233816


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3898/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-682211893


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2156/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI

2020-08-27 Thread GitBox


Karan980 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-682161092


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akkio-97 opened a new pull request #3906: [WIP] Added test cases for hive read complex types and handled other issues

2020-08-27 Thread GitBox


akkio-97 opened a new pull request #3906:
URL: https://github.com/apache/carbondata/pull/3906


   
   
### Why is this PR needed?
   1) Added test cases for hive read complex types.
   2) Handled issues related to reading of byte, varchar and decimal types.

### What changes were proposed in this PR?
   1) Added test cases for hive read complex types.
   2) Handled issues related to reading of byte, varchar and decimal types.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-27 Thread GitBox


CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-682212894


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3897/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org