[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-26 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478195952



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java
##
@@ -823,6 +829,31 @@ public void write(Object object) throws IOException {
 }
   }
 
+  /**
+   * Load data of all avro files at given location iteratively.
+   *
+   * @throws IOException
+   */
+  @Override
+  public void write() throws IOException {
+if (this.dataFiles == null || this.dataFiles.length == 0) {
+  throw new RuntimeException("'withAvroPath()' must be called to support 
loading avro files");
+}
+Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath));

Review comment:
   Is this sort required ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-26 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r478194800



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java
##
@@ -25,17 +25,12 @@
 import java.math.BigDecimal;
 import java.math.BigInteger;
 import java.nio.ByteBuffer;
-import java.util.ArrayList;
-import java.util.HashMap;
-import java.util.Iterator;
-import java.util.List;
-import java.util.Map;
-import java.util.Random;
-import java.util.UUID;
+import java.util.*;

Review comment:
   Please do not use wildcard import.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-08-26 Thread GitBox


CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-681629618


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2142/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files

2020-08-26 Thread GitBox


VenuReddy2103 commented on a change in pull request #3819:
URL: https://github.com/apache/carbondata/pull/3819#discussion_r472407770



##
File path: 
sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java
##
@@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema 
carbonSchema) {
 return this;
   }
 
+  private void validateCsvFiles() throws IOException {
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION);
+if (CollectionUtils.isEmpty(Arrays.asList(dataFiles))) {
+  throw new RuntimeException("CSV files can't be empty.");
+}
+for (CarbonFile dataFile : dataFiles) {
+  try {
+CsvParser csvParser = SDKUtil.buildCsvParser(this.hadoopConf);
+
csvParser.beginParsing(FileFactory.getDataInputStream(dataFile.getPath(),
+-1, this.hadoopConf));
+  } catch (IllegalArgumentException ex) {
+if (ex.getCause() instanceof FileNotFoundException) {
+  throw new FileNotFoundException("File " + dataFile +
+  " not found to build carbon writer.");
+}
+throw ex;
+  }
+}
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading CSV files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withCsvInput();
+this.validateCsvFiles();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts CSV files directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the CSV file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withCsvPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withCsvPath(filePath);
+return this;
+  }
+
+  private void validateJsonFiles() throws IOException {
+CarbonFile[] dataFiles = 
this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION);
+for (CarbonFile dataFile : dataFiles) {
+  try {
+new JSONParser().parse(SDKUtil.buildJsonReader(dataFile, 
this.hadoopConf));
+  } catch (FileNotFoundException ex) {
+throw new FileNotFoundException("File " + dataFile + " not found to 
build carbon writer.");
+  } catch (ParseException ex) {
+throw new RuntimeException("File " + dataFile + " is not in json 
format.");
+  }
+}
+this.dataFiles = dataFiles;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading JSON files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath) throws IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.withJsonInput();
+this.validateJsonFiles();
+return this;
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts JSON file directory and
+   * list of file which has to be loaded.
+   *
+   * @param filePath directory where the json file exists.
+   * @param fileList list of files which has to be loaded.
+   * @return CarbonWriterBuilder
+   * @throws IOException
+   */
+  public CarbonWriterBuilder withJsonPath(String filePath, List 
fileList)
+  throws IOException {
+this.fileList = fileList;
+this.withJsonPath(filePath);
+return this;
+  }
+
+  private void validateFilePath(String filePath) {
+if (StringUtils.isEmpty(filePath)) {
+  throw new IllegalArgumentException("filePath can not be empty");
+}
+  }
+
+  /**
+   * to build a {@link CarbonWriter}, which accepts loading Parquet files.
+   *
+   * @param filePath absolute path under which files should be loaded.
+   * @return CarbonWriterBuilder
+   */
+  public CarbonWriterBuilder withParquetPath(String filePath) throws 
IOException {
+this.validateFilePath(filePath);
+this.filePath = filePath;
+this.setIsDirectory(filePath);
+this.writerType = WRITER_TYPE.PARQUET;
+this.validateParquetFiles();
+return this;
+  }
+
+  private void setIsDirectory(String filePath) {
+if (this.hadoopConf == null) {
+  this.hadoopConf = new Configuration(FileFactory.getConfiguration());

Review comment:
   Had checked the base code. In the base code, we seem to directly assign 
the return value of FileFactory.getConfiguration() instead of new 
Configuration. Suggest to check and keep it consistent. Check for all the 
places in this PR.





This is an automated message from the Apache Git Service.
To respond t

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-08-26 Thread GitBox


CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-681626492


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3883/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 closed pull request #3900: [WIP] Test

2020-08-26 Thread GitBox


Indhumathi27 closed pull request #3900:
URL: https://github.com/apache/carbondata/pull/3900


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-26 Thread GitBox


VenuReddy2103 commented on pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#issuecomment-681588563


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-26 Thread GitBox


VenuReddy2103 commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r478153107



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##
@@ -306,7 +315,51 @@ class TestLoadDataWithDiffTimestampFormat extends 
QueryTest with BeforeAndAfterA
 }
   }
 
+  test("test load, update data with setlenient carbon property for daylight " +
+   "saving time from different timezone") {
+
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE,
 "true")
+TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+sql("DROP TABLE IF EXISTS test_time")
+sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time 
Timestamp) STORED AS carbondata " +
+"TBLPROPERTIES('dateformat'='-MM-dd', 
'timestampformat'='-MM-dd HH:mm:ss') ")
+sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time")
+sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+sql("DROP TABLE test_time")
+
CarbonProperties.getInstance().removeProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE)
+  }
+
+  test("test load, update data with setlenient session level property for 
daylight " +
+   "saving time from different timezone") {
+sql("set carbon.load.dateformat.setlenient.enable = true")
+TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+sql("DROP TABLE IF EXISTS test_time")
+sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time 
Timestamp) STORED AS carbondata " +
+"TBLPROPERTIES('dateformat'='-MM-dd', 
'timestampformat'='-MM-dd HH:mm:ss') ")
+sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time")
+sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+sql("DROP TABLE test_time")
+defaultConfig()
+  }
+
+  def generateCSVFile(): Unit = {
+val rows = new ListBuffer[Array[String]]
+rows += Array("ID", "date", "time")
+rows += Array("1", "1941-3-15", "1941-3-15 00:00:00")
+rows += Array("2", "2016-7-24", "2016-7-24 01:02:30")
+BadRecordUtil.createCSV(rows, csvPath)
+  }
+
   override def afterAll {
 sql("DROP TABLE IF EXISTS t3")
+FileUtils.forceDelete(new File(csvPath))
+TimeZone.setDefault(defaultTimeZone)

Review comment:
   `afterAll()` is called only once per testcase file. But, have to set  
back`TimeZone.setDefault(defaultTimeZone)` at the end of the testcase  where 
you changed  zone to China/Shanghai.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-26 Thread GitBox


VenuReddy2103 commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r478153107



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##
@@ -306,7 +315,51 @@ class TestLoadDataWithDiffTimestampFormat extends 
QueryTest with BeforeAndAfterA
 }
   }
 
+  test("test load, update data with setlenient carbon property for daylight " +
+   "saving time from different timezone") {
+
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE,
 "true")
+TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+sql("DROP TABLE IF EXISTS test_time")
+sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time 
Timestamp) STORED AS carbondata " +
+"TBLPROPERTIES('dateformat'='-MM-dd', 
'timestampformat'='-MM-dd HH:mm:ss') ")
+sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time")
+sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+sql("DROP TABLE test_time")
+
CarbonProperties.getInstance().removeProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE)
+  }
+
+  test("test load, update data with setlenient session level property for 
daylight " +
+   "saving time from different timezone") {
+sql("set carbon.load.dateformat.setlenient.enable = true")
+TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+sql("DROP TABLE IF EXISTS test_time")
+sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time 
Timestamp) STORED AS carbondata " +
+"TBLPROPERTIES('dateformat'='-MM-dd', 
'timestampformat'='-MM-dd HH:mm:ss') ")
+sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time")
+sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), 
Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"
+sql("DROP TABLE test_time")
+defaultConfig()
+  }
+
+  def generateCSVFile(): Unit = {
+val rows = new ListBuffer[Array[String]]
+rows += Array("ID", "date", "time")
+rows += Array("1", "1941-3-15", "1941-3-15 00:00:00")
+rows += Array("2", "2016-7-24", "2016-7-24 01:02:30")
+BadRecordUtil.createCSV(rows, csvPath)
+  }
+
   override def afterAll {
 sql("DROP TABLE IF EXISTS t3")
+FileUtils.forceDelete(new File(csvPath))
+TimeZone.setDefault(defaultTimeZone)

Review comment:
   `afterAll()` is called only once per testcase file. But, have to set  
back`TimeZone.setDefault(defaultTimeZone)` at the end of the testcase  where 
you changed  zone to China/Shanghai.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3893: [CARBONDATA-3959] Added new property to set the value of executor LRU cache size to 70% of the total executor memory in Inde

2020-08-26 Thread GitBox


kunal642 commented on a change in pull request #3893:
URL: https://github.com/apache/carbondata/pull/3893#discussion_r477384526



##
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##
@@ -1280,6 +1280,11 @@ private CarbonCommonConstants() {
   public static final String CARBON_MAX_EXECUTOR_LRU_CACHE_SIZE =
   "carbon.max.executor.lru.cache.size";
 
+  /**
+   * when executor LRU cache is not configured, set it to 70% percent of 
executor memory size
+   */
+  public static final double CARBON_DEFAULT_EXECUTOR_LRU_CACHE_PERCENT = 0.7d;

Review comment:
   Please expose a property so that the user can set the % value that is 
required as the LRU cache size. And make sure that the user should not be able 
to set it to invalid values like "negative values, 0(0%), 1(100%) etc.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3893: [CARBONDATA-3959] Added new property to set the value of executor LRU cache size to 70% of the total executor memory in Inde

2020-08-26 Thread GitBox


kunal642 commented on a change in pull request #3893:
URL: https://github.com/apache/carbondata/pull/3893#discussion_r477382736



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala
##
@@ -243,11 +243,26 @@ object IndexServer extends ServerInterface {
   def main(args: Array[String]): Unit = {
 if (serverIp.isEmpty) {
   throw new RuntimeException(s"Please set the server IP to use Index Cache 
Server")
-} else if (!isExecutorLRUConfigured) {
-  throw new RuntimeException(s"Executor LRU cache size is not set. Please 
set using " +
- s"${ 
CarbonCommonConstants.CARBON_MAX_EXECUTOR_LRU_CACHE_SIZE }")
 } else {
   createCarbonSession()
+  if (!isExecutorLRUConfigured) {

Review comment:
   This is the wrong place to set the executor memory based on the %!!!
   If you set here then only the driver knows about the 
CARBON_MAX_EXECUTOR_LRU_CACHE_SIZE. Executor would still use the default value.
   
   Please move this to CacheProvider class.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3894: [CARBONDATA-3957] Added property to enable disable the repair logic and provide a limit to number of segments in SILoadEvent

2020-08-26 Thread GitBox


kunal642 commented on a change in pull request #3894:
URL: https://github.com/apache/carbondata/pull/3894#discussion_r477356848



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/SILoadEventListenerForFailedSegments.scala
##
@@ -57,189 +58,201 @@ class SILoadEventListenerForFailedSegments extends 
OperationEventListener with L
 event match {
   case postStatusUpdateEvent: LoadTablePostStatusUpdateEvent =>
 LOGGER.info("Load post status update event-listener called")
-val loadTablePostStatusUpdateEvent = 
event.asInstanceOf[LoadTablePostStatusUpdateEvent]
-val carbonLoadModel = loadTablePostStatusUpdateEvent.getCarbonLoadModel
-val sparkSession = SparkSession.getActiveSession.get
-// when Si creation and load to main table are parallel, get the 
carbonTable from the
-// metastore which will have the latest index Info
-val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
-val carbonTable = metaStore
-  .lookupRelation(Some(carbonLoadModel.getDatabaseName),
-
carbonLoadModel.getTableName)(sparkSession).asInstanceOf[CarbonRelation].carbonTable
-val indexMetadata = carbonTable.getIndexMetadata
-val secondaryIndexProvider = IndexType.SI.getIndexProviderName
-if (null != indexMetadata && null != indexMetadata.getIndexesMap &&
+if (CarbonProperties.getInstance().isSIRepairEnabled) {
+  val loadTablePostStatusUpdateEvent = 
event.asInstanceOf[LoadTablePostStatusUpdateEvent]
+  val carbonLoadModel = 
loadTablePostStatusUpdateEvent.getCarbonLoadModel
+  val sparkSession = SparkSession.getActiveSession.get
+  // when Si creation and load to main table are parallel, get the 
carbonTable from the
+  // metastore which will have the latest index Info
+  val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
+  val carbonTable = metaStore
+.lookupRelation(Some(carbonLoadModel.getDatabaseName),
+  
carbonLoadModel.getTableName)(sparkSession).asInstanceOf[CarbonRelation].carbonTable
+  val indexMetadata = carbonTable.getIndexMetadata
+  val secondaryIndexProvider = IndexType.SI.getIndexProviderName
+  if (null != indexMetadata && null != indexMetadata.getIndexesMap &&
 null != indexMetadata.getIndexesMap.get(secondaryIndexProvider)) {
-  val indexTables = indexMetadata.getIndexesMap
-.get(secondaryIndexProvider).keySet().asScala
-  // if there are no index tables for a given fact table do not 
perform any action
-  if (indexTables.nonEmpty) {
-val mainTableDetails =
-  
SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)
-indexTables.foreach {
-  indexTableName =>
-val isLoadSIForFailedSegments = 
sparkSession.sessionState.catalog
-  .getTableMetadata(TableIdentifier(indexTableName,
-Some(carbonLoadModel.getDatabaseName))).storage.properties
-  .getOrElse("isSITableEnabled", "true").toBoolean
-val indexTable = metaStore
-  .lookupRelation(Some(carbonLoadModel.getDatabaseName), 
indexTableName)(
-sparkSession)
-  .asInstanceOf[CarbonRelation]
-  .carbonTable
+val indexTables = indexMetadata.getIndexesMap
+  .get(secondaryIndexProvider).keySet().asScala
+// if there are no index tables for a given fact table do not 
perform any action
+if (indexTables.nonEmpty) {
+  val mainTableDetails =
+
SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)
+  indexTables.foreach {
+indexTableName =>
+  val isLoadSIForFailedSegments = 
sparkSession.sessionState.catalog
+.getTableMetadata(TableIdentifier(indexTableName,
+  
Some(carbonLoadModel.getDatabaseName))).storage.properties
+.getOrElse("isSITableEnabled", "true").toBoolean
+  val indexTable = metaStore
+.lookupRelation(Some(carbonLoadModel.getDatabaseName), 
indexTableName)(
+  sparkSession)
+.asInstanceOf[CarbonRelation]
+.carbonTable
 
-val mainTblLoadMetadataDetails: Array[LoadMetadataDetails] =
-  
SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)
-val siTblLoadMetadataDetails: Array[LoadMetadataDetails] =
-  
SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath)
-var segmentLocks: ListBuffer[ICarbonLock] = ListBuffer.empty
-if (!isLoadSIForFailedSegments
+  val mainTblLoadMetadataDetails

[GitHub] [carbondata] kunal642 commented on a change in pull request #3894: [CARBONDATA-3957] Added property to enable disable the repair logic and provide a limit to number of segments in SILoadEvent

2020-08-26 Thread GitBox


kunal642 commented on a change in pull request #3894:
URL: https://github.com/apache/carbondata/pull/3894#discussion_r475533562



##
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##
@@ -2262,6 +2262,32 @@ private CarbonCommonConstants() {
   public static final int CARBON_INDEX_SERVER_WORKER_THREADS_DEFAULT =
   500;
 
+  /**
+   * Configured property to enable/disable load failed segments in SI table 
during
+   * load/insert command.
+   */
+  @CarbonProperty(dynamicConfigurable = true)
+  public static final String CARBON_LOAD_SI_REPAIR =  "carbon.load.si.repair";
+
+  /**
+   * Default value for load failed segments in SI table during
+   * load/insert command.
+   */
+  public static final String CARBON_LOAD_SI_REPAIR_DEFAULT = "true";
+
+  /**
+   * Property to give a limit to the number of segments that are reloaded in 
the
+   * SI table in the FailedSegments listener.
+   */
+  @CarbonProperty
+  public static final String CARBON_SI_REPAIR_LIMIT =  
"carbon.si.repair.limit";

Review comment:
   1. I think its better to have these 2 properties at table level instead 
of global.
   
   2. If this is a dynamic conf then add (dynamicConfiguration = true) as the 
annotation. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-26 Thread GitBox


kunal642 commented on a change in pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#discussion_r477352862



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##
@@ -377,4 +381,212 @@ object CarbonIndexUtil {
   AlterTableUtil.releaseLocks(locks.asScala.toList)
 }
   }
+
+  def processSIRepair(indexTableName: String, carbonTable: CarbonTable,
+carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata,
+  mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: 
String)
+  (sparkSession: SparkSession) : Unit = {
+val sparkSession = SparkSession.getActiveSession.get
+// val databaseName = sparkSession.catalog.currentDatabase
+// when Si creation and load to main table are parallel, get the 
carbonTable from the
+// metastore which will have the latest index Info
+val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
+val indexTable = metaStore
+  .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)(
+sparkSession)
+  .asInstanceOf[CarbonRelation]
+  .carbonTable
+
+val mainTblLoadMetadataDetails: Array[LoadMetadataDetails] =
+  SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)
+val siTblLoadMetadataDetails: Array[LoadMetadataDetails] =
+  SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath)
+var segmentLocks: ListBuffer[ICarbonLock] = ListBuffer.empty
+if (!CarbonInternalLoaderUtil.checkMainTableSegEqualToSISeg(
+  mainTblLoadMetadataDetails,
+  siTblLoadMetadataDetails)) {
+  val indexColumns = indexMetadata.getIndexColumns(secondaryIndexProvider,
+indexTableName)
+  val secondaryIndex = IndexModel(Some(carbonTable.getDatabaseName),
+indexMetadata.getParentTableName,
+indexColumns.split(",").toList,
+indexTableName)
+
+  var details = 
SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath)
+  // If it empty, then no need to do further computations because the
+  // tabletstatus might not have been created and hence next load will 
take care
+  if (details.isEmpty) {
+Seq.empty
+  }
+
+  val failedLoadMetadataDetails: java.util.List[LoadMetadataDetails] = new 
util
+  .ArrayList[LoadMetadataDetails]()
+
+  // read the details of SI table and get all the failed segments during SI
+  // creation which are MARKED_FOR_DELETE or invalid INSERT_IN_PROGRESS
+  details.collect {

Review comment:
   change to foreach





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-26 Thread GitBox


kunal642 commented on a change in pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#discussion_r477351673



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##
@@ -377,4 +381,212 @@ object CarbonIndexUtil {
   AlterTableUtil.releaseLocks(locks.asScala.toList)
 }
   }
+
+  def processSIRepair(indexTableName: String, carbonTable: CarbonTable,
+carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata,
+  mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: 
String)
+  (sparkSession: SparkSession) : Unit = {
+val sparkSession = SparkSession.getActiveSession.get
+// val databaseName = sparkSession.catalog.currentDatabase
+// when Si creation and load to main table are parallel, get the 
carbonTable from the
+// metastore which will have the latest index Info
+val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
+val indexTable = metaStore
+  .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)(
+sparkSession)
+  .asInstanceOf[CarbonRelation]
+  .carbonTable
+
+val mainTblLoadMetadataDetails: Array[LoadMetadataDetails] =
+  SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)
+val siTblLoadMetadataDetails: Array[LoadMetadataDetails] =
+  SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath)
+var segmentLocks: ListBuffer[ICarbonLock] = ListBuffer.empty
+if (!CarbonInternalLoaderUtil.checkMainTableSegEqualToSISeg(
+  mainTblLoadMetadataDetails,
+  siTblLoadMetadataDetails)) {
+  val indexColumns = indexMetadata.getIndexColumns(secondaryIndexProvider,
+indexTableName)
+  val secondaryIndex = IndexModel(Some(carbonTable.getDatabaseName),
+indexMetadata.getParentTableName,
+indexColumns.split(",").toList,
+indexTableName)
+
+  var details = 
SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath)

Review comment:
   Why 2nd time read is required for SI?
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-26 Thread GitBox


kunal642 commented on a change in pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#discussion_r477351380



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##
@@ -377,4 +381,212 @@ object CarbonIndexUtil {
   AlterTableUtil.releaseLocks(locks.asScala.toList)
 }
   }
+
+  def processSIRepair(indexTableName: String, carbonTable: CarbonTable,
+carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata,
+  mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: 
String)
+  (sparkSession: SparkSession) : Unit = {
+val sparkSession = SparkSession.getActiveSession.get
+// val databaseName = sparkSession.catalog.currentDatabase
+// when Si creation and load to main table are parallel, get the 
carbonTable from the
+// metastore which will have the latest index Info
+val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
+val indexTable = metaStore
+  .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)(
+sparkSession)
+  .asInstanceOf[CarbonRelation]
+  .carbonTable
+
+val mainTblLoadMetadataDetails: Array[LoadMetadataDetails] =
+  SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)
+val siTblLoadMetadataDetails: Array[LoadMetadataDetails] =
+  SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath)
+var segmentLocks: ListBuffer[ICarbonLock] = ListBuffer.empty
+if (!CarbonInternalLoaderUtil.checkMainTableSegEqualToSISeg(
+  mainTblLoadMetadataDetails,
+  siTblLoadMetadataDetails)) {
+  val indexColumns = indexMetadata.getIndexColumns(secondaryIndexProvider,
+indexTableName)
+  val secondaryIndex = IndexModel(Some(carbonTable.getDatabaseName),

Review comment:
   change variable name to indexModel





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-26 Thread GitBox


kunal642 commented on a change in pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#discussion_r477350602



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##
@@ -377,4 +381,212 @@ object CarbonIndexUtil {
   AlterTableUtil.releaseLocks(locks.asScala.toList)
 }
   }
+
+  def processSIRepair(indexTableName: String, carbonTable: CarbonTable,
+carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata,
+  mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: 
String)
+  (sparkSession: SparkSession) : Unit = {
+val sparkSession = SparkSession.getActiveSession.get
+// val databaseName = sparkSession.catalog.currentDatabase
+// when Si creation and load to main table are parallel, get the 
carbonTable from the
+// metastore which will have the latest index Info
+val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
+val indexTable = metaStore
+  .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)(
+sparkSession)
+  .asInstanceOf[CarbonRelation]
+  .carbonTable
+
+val mainTblLoadMetadataDetails: Array[LoadMetadataDetails] =
+  SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)

Review comment:
   consider passing the loadMetadata details from the caller to avoid 
multiple reading.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-26 Thread GitBox


kunal642 commented on a change in pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#discussion_r477348891



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##
@@ -377,4 +381,212 @@ object CarbonIndexUtil {
   AlterTableUtil.releaseLocks(locks.asScala.toList)
 }
   }
+
+  def processSIRepair(indexTableName: String, carbonTable: CarbonTable,
+carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata,
+  mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: 
String)
+  (sparkSession: SparkSession) : Unit = {
+val sparkSession = SparkSession.getActiveSession.get

Review comment:
   please use the sparkSession which is passed 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-26 Thread GitBox


kunal642 commented on a change in pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#discussion_r477348313



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/IndexRepairCommand.scala
##
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command.index
+
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.execution.command.DataCommand
+import org.apache.spark.sql.hive.CarbonRelation
+import org.apache.spark.sql.index.CarbonIndexUtil
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.metadata.index.IndexType
+import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, 
SegmentStatusManager}
+import org.apache.carbondata.core.util.path.CarbonTablePath
+import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, 
CarbonLoadModel}
+
+/**
+ * Repair logic for reindex command on maintable/indextable
+ */
+case class IndexRepairCommand(indexnameOp: Option[String], tableIdentifier: 
TableIdentifier,
+  dbName: String,
+  segments: Option[List[String]]) extends 
DataCommand {
+
+  private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName)
+
+  def processData(sparkSession: SparkSession): Seq[Row] = {
+if (dbName == null) {
+  // dbName is null, repair for index table or all the index table in main 
table
+  val databaseName = if (tableIdentifier.database.isEmpty) {
+SparkSession.getActiveSession.get.catalog.currentDatabase
+  } else {
+tableIdentifier.database.get
+  }
+  triggerRepair(tableIdentifier.table, databaseName, indexnameOp, segments)
+} else {
+  // repairing si for all  index tables in the mentioned database in the 
repair command
+  sparkSession.sessionState.catalog.listTables(dbName).foreach {
+tableIdent =>
+  triggerRepair(tableIdent.table, dbName, indexnameOp, segments)
+  }
+}
+Seq.empty
+  }
+
+  def triggerRepair(tableNameOp: String, databaseName: String,
+indexTableToRepair: Option[String], segments: 
Option[List[String]]): Unit = {
+val sparkSession = SparkSession.getActiveSession.get
+// when Si creation and load to main table are parallel, get the 
carbonTable from the
+// metastore which will have the latest index Info
+val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
+val mainCarbonTable = metaStore
+  .lookupRelation(Some(databaseName), tableNameOp)(sparkSession)
+  .asInstanceOf[CarbonRelation].carbonTable
+
+val carbonLoadModel = new CarbonLoadModel
+carbonLoadModel.setDatabaseName(databaseName)
+carbonLoadModel.setTableName(tableNameOp)
+carbonLoadModel.setTablePath(mainCarbonTable.getTablePath)
+val tableStatusFilePath = 
CarbonTablePath.getTableStatusFilePath(mainCarbonTable.getTablePath)
+carbonLoadModel.setLoadMetadataDetails(SegmentStatusManager
+  .readTableStatusFile(tableStatusFilePath).toList.asJava)
+carbonLoadModel.setCarbonDataLoadSchema(new 
CarbonDataLoadSchema(mainCarbonTable))
+
+val indexMetadata = mainCarbonTable.getIndexMetadata
+val secondaryIndexProvider = IndexType.SI.getIndexProviderName
+if (null != indexMetadata && null != indexMetadata.getIndexesMap &&
+  null != indexMetadata.getIndexesMap.get(secondaryIndexProvider)) {
+  val indexTables = indexMetadata.getIndexesMap
+.get(secondaryIndexProvider).keySet().asScala
+  // if there are no index tables for a given fact table do not perform 
any action
+  if (indexTables.nonEmpty) {
+val mainTableDetails = if (segments.isEmpty) {
+  carbonLoadModel.getLoadMetadataDetails.asScala.toList
+  // SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)
+} else {
+  // get segments for main table
+  carbonLoadModel.getLoadMetadataDetails.asScala.toList.filter(
+  

[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-26 Thread GitBox


kunal642 commented on a change in pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#discussion_r477347873



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/IndexRepairCommand.scala
##
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command.index
+
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.execution.command.DataCommand
+import org.apache.spark.sql.hive.CarbonRelation
+import org.apache.spark.sql.index.CarbonIndexUtil
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.metadata.index.IndexType
+import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, 
SegmentStatusManager}
+import org.apache.carbondata.core.util.path.CarbonTablePath
+import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, 
CarbonLoadModel}
+
+/**
+ * Repair logic for reindex command on maintable/indextable
+ */
+case class IndexRepairCommand(indexnameOp: Option[String], tableIdentifier: 
TableIdentifier,
+  dbName: String,
+  segments: Option[List[String]]) extends 
DataCommand {
+
+  private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName)
+
+  def processData(sparkSession: SparkSession): Seq[Row] = {
+if (dbName == null) {
+  // dbName is null, repair for index table or all the index table in main 
table
+  val databaseName = if (tableIdentifier.database.isEmpty) {
+SparkSession.getActiveSession.get.catalog.currentDatabase
+  } else {
+tableIdentifier.database.get
+  }
+  triggerRepair(tableIdentifier.table, databaseName, indexnameOp, segments)
+} else {
+  // repairing si for all  index tables in the mentioned database in the 
repair command
+  sparkSession.sessionState.catalog.listTables(dbName).foreach {
+tableIdent =>
+  triggerRepair(tableIdent.table, dbName, indexnameOp, segments)
+  }
+}
+Seq.empty
+  }
+
+  def triggerRepair(tableNameOp: String, databaseName: String,
+indexTableToRepair: Option[String], segments: 
Option[List[String]]): Unit = {
+val sparkSession = SparkSession.getActiveSession.get
+// when Si creation and load to main table are parallel, get the 
carbonTable from the
+// metastore which will have the latest index Info
+val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
+val mainCarbonTable = metaStore
+  .lookupRelation(Some(databaseName), tableNameOp)(sparkSession)
+  .asInstanceOf[CarbonRelation].carbonTable
+
+val carbonLoadModel = new CarbonLoadModel
+carbonLoadModel.setDatabaseName(databaseName)
+carbonLoadModel.setTableName(tableNameOp)
+carbonLoadModel.setTablePath(mainCarbonTable.getTablePath)
+val tableStatusFilePath = 
CarbonTablePath.getTableStatusFilePath(mainCarbonTable.getTablePath)
+carbonLoadModel.setLoadMetadataDetails(SegmentStatusManager
+  .readTableStatusFile(tableStatusFilePath).toList.asJava)
+carbonLoadModel.setCarbonDataLoadSchema(new 
CarbonDataLoadSchema(mainCarbonTable))
+
+val indexMetadata = mainCarbonTable.getIndexMetadata
+val secondaryIndexProvider = IndexType.SI.getIndexProviderName
+if (null != indexMetadata && null != indexMetadata.getIndexesMap &&
+  null != indexMetadata.getIndexesMap.get(secondaryIndexProvider)) {
+  val indexTables = indexMetadata.getIndexesMap
+.get(secondaryIndexProvider).keySet().asScala
+  // if there are no index tables for a given fact table do not perform 
any action
+  if (indexTables.nonEmpty) {
+val mainTableDetails = if (segments.isEmpty) {
+  carbonLoadModel.getLoadMetadataDetails.asScala.toList
+  // SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath)
+} else {
+  // get segments for main table
+  carbonLoadModel.getLoadMetadataDetails.asScala.toList.filter(
+  

[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-26 Thread GitBox


kunal642 commented on a change in pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#discussion_r477343487



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/IndexRepairCommand.scala
##
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command.index
+
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.execution.command.DataCommand
+import org.apache.spark.sql.hive.CarbonRelation
+import org.apache.spark.sql.index.CarbonIndexUtil
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.metadata.index.IndexType
+import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, 
SegmentStatusManager}
+import org.apache.carbondata.core.util.path.CarbonTablePath
+import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, 
CarbonLoadModel}
+
+/**
+ * Repair logic for reindex command on maintable/indextable
+ */
+case class IndexRepairCommand(indexnameOp: Option[String], tableIdentifier: 
TableIdentifier,
+  dbName: String,
+  segments: Option[List[String]]) extends 
DataCommand {
+
+  private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName)
+
+  def processData(sparkSession: SparkSession): Seq[Row] = {
+if (dbName == null) {
+  // dbName is null, repair for index table or all the index table in main 
table
+  val databaseName = if (tableIdentifier.database.isEmpty) {
+SparkSession.getActiveSession.get.catalog.currentDatabase
+  } else {
+tableIdentifier.database.get
+  }
+  triggerRepair(tableIdentifier.table, databaseName, indexnameOp, segments)
+} else {
+  // repairing si for all  index tables in the mentioned database in the 
repair command
+  sparkSession.sessionState.catalog.listTables(dbName).foreach {
+tableIdent =>
+  triggerRepair(tableIdent.table, dbName, indexnameOp, segments)
+  }
+}
+Seq.empty
+  }
+
+  def triggerRepair(tableNameOp: String, databaseName: String,
+indexTableToRepair: Option[String], segments: 
Option[List[String]]): Unit = {
+val sparkSession = SparkSession.getActiveSession.get

Review comment:
   pass the sparksession from the caller





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-26 Thread GitBox


kunal642 commented on a change in pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#discussion_r477343137



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/IndexRepairCommand.scala
##
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command.index
+
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.execution.command.DataCommand
+import org.apache.spark.sql.hive.CarbonRelation
+import org.apache.spark.sql.index.CarbonIndexUtil
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.metadata.index.IndexType
+import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, 
SegmentStatusManager}
+import org.apache.carbondata.core.util.path.CarbonTablePath
+import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, 
CarbonLoadModel}
+
+/**
+ * Repair logic for reindex command on maintable/indextable
+ */
+case class IndexRepairCommand(indexnameOp: Option[String], tableIdentifier: 
TableIdentifier,
+  dbName: String,
+  segments: Option[List[String]]) extends 
DataCommand {
+
+  private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName)
+
+  def processData(sparkSession: SparkSession): Seq[Row] = {
+if (dbName == null) {
+  // dbName is null, repair for index table or all the index table in main 
table
+  val databaseName = if (tableIdentifier.database.isEmpty) {
+SparkSession.getActiveSession.get.catalog.currentDatabase
+  } else {
+tableIdentifier.database.get
+  }
+  triggerRepair(tableIdentifier.table, databaseName, indexnameOp, segments)
+} else {
+  // repairing si for all  index tables in the mentioned database in the 
repair command
+  sparkSession.sessionState.catalog.listTables(dbName).foreach {
+tableIdent =>
+  triggerRepair(tableIdent.table, dbName, indexnameOp, segments)
+  }
+}
+Seq.empty
+  }
+
+  def triggerRepair(tableNameOp: String, databaseName: String,

Review comment:
   change "tableNameOp" to "tableName"





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table

2020-08-26 Thread GitBox


kunal642 commented on a change in pull request #3873:
URL: https://github.com/apache/carbondata/pull/3873#discussion_r477342379



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/IndexRepairCommand.scala
##
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command.index
+
+import java.util
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.execution.command.DataCommand
+import org.apache.spark.sql.hive.CarbonRelation
+import org.apache.spark.sql.index.CarbonIndexUtil
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.metadata.index.IndexType
+import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, 
SegmentStatusManager}
+import org.apache.carbondata.core.util.path.CarbonTablePath
+import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, 
CarbonLoadModel}
+
+/**
+ * Repair logic for reindex command on maintable/indextable
+ */
+case class IndexRepairCommand(indexnameOp: Option[String], tableIdentifier: 
TableIdentifier,
+  dbName: String,
+  segments: Option[List[String]]) extends 
DataCommand {
+
+  private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName)
+
+  def processData(sparkSession: SparkSession): Seq[Row] = {
+if (dbName == null) {
+  // dbName is null, repair for index table or all the index table in main 
table
+  val databaseName = if (tableIdentifier.database.isEmpty) {
+SparkSession.getActiveSession.get.catalog.currentDatabase

Review comment:
   please use the sparkSession which is passed as parameter to this method





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3900: [WIP] Test

2020-08-26 Thread GitBox


CarbonDataQA1 commented on pull request #3900:
URL: https://github.com/apache/carbondata/pull/3900#issuecomment-680907790


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3882/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kumarvishal09 commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-26 Thread GitBox


kumarvishal09 commented on pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#issuecomment-680904506


   @ajantha-bhat By default Row filter push down is false, In case of complex 
type for eg: array of string number of records can be much more than 32k, so 
filling in one shot can be a problem because creating a big array will be an 
overhead. 
   
   + In case of direct fill we have ResuableDataBuffer to create one big byte 
array/column for one blocklet processing  and reuse for all the page for a 
column inside, this wont be useful because number of element in case of complex 
type can vary, and byte array size will change every page and even creating a 
big byte array for one page when number of element in child is more can degrade 
query performance. 
   
   So for complex type user must configure table page size to get the better 
query performance while creating the table.
   
   If number of row is page is higher we can  avoid ResuableDataBuffer it may 
be become overhead. 
   
   Some comments:
   1. Pls refactor the code, avoid duplicate code if you can 
   2. Pls add comments.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3900: [WIP] Test

2020-08-26 Thread GitBox


CarbonDataQA1 commented on pull request #3900:
URL: https://github.com/apache/carbondata/pull/3900#issuecomment-680902015


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2141/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-26 Thread GitBox


kumarvishal09 commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r477313518



##
File path: 
core/src/main/java/org/apache/carbondata/core/scan/result/BlockletScannedResult.java
##
@@ -153,6 +154,9 @@
 
   private ReusableDataBuffer[] measureReusableBuffer;
 
+  // index used by dimensionReusableBuffer
+  int dimensionReusableBufferIndex = 0;

Review comment:
   private int dimensionReusableBufferIndex = 0;





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-26 Thread GitBox


kumarvishal09 commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r477313518



##
File path: 
core/src/main/java/org/apache/carbondata/core/scan/result/BlockletScannedResult.java
##
@@ -153,6 +154,9 @@
 
   private ReusableDataBuffer[] measureReusableBuffer;
 
+  // index used by dimensionReusableBuffer
+  int dimensionReusableBufferIndex = 0;

Review comment:
   change private int dimensionReusableBufferIndex;





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-26 Thread GitBox


kumarvishal09 commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r477313220



##
File path: 
core/src/main/java/org/apache/carbondata/core/scan/executor/impl/AbstractQueryExecutor.java
##
@@ -98,6 +98,9 @@
*/
   protected CarbonIterator queryIterator;
 
+  // Size of the ReusableDataBuffer based on the number of dimension 
projection columns
+  protected int reusableDimensionBufferSize = 0;

Review comment:
   by default Int value is 0, remove it 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3904: [CARBONDATA-3962]Remove unwanted empty fact directory in case of flat_folder table

2020-08-26 Thread GitBox


CarbonDataQA1 commented on pull request #3904:
URL: https://github.com/apache/carbondata/pull/3904#issuecomment-680889796


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2140/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3904: [CARBONDATA-3962]Remove unwanted empty fact directory in case of flat_folder table

2020-08-26 Thread GitBox


CarbonDataQA1 commented on pull request #3904:
URL: https://github.com/apache/carbondata/pull/3904#issuecomment-680889554


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3881/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-26 Thread GitBox


kumarvishal09 commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r477308991



##
File path: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java
##
@@ -376,6 +456,56 @@ private void fillVector(byte[] pageData, 
CarbonColumnVector vector, DataType vec
   DecimalConverterFactory.DecimalConverter decimalConverter = 
vectorInfo.decimalConverter;
   decimalConverter.fillVector(pageData, pageSize, vectorInfo, 
nullBits, pageDataType);
 }
+  } else if (pageDataType == DataTypes.BYTE_ARRAY) {
+if (vectorDataType == DataTypes.STRING || vectorDataType == 
DataTypes.BINARY
+|| vectorDataType == DataTypes.VARCHAR) {
+  // for complex primitive string, binary, varchar type
+  int offset = 0;
+  for (int i = 0; i < pageSize; i++) {
+byte[] stringLen = new byte[DataTypes.INT.getSizeInBytes()];

Review comment:
   this byte array is not required, wrap page data inside byte buffer and 
updates  the position to get the length of the data
   By this we can avoid creating multiple bytebuffer and byte array . Pls 
handle all the places 
   
   ```
   ByteBuffer buffer = ByteBuffer.allocate(12);
   buffer.putInt(1);
   buffer.putInt(2);
   buffer.putInt(3);
   buffer.rewind();
   buffer.position(4);
   int anInt = buffer.getInt();
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-26 Thread GitBox


kumarvishal09 commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r477308991



##
File path: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java
##
@@ -376,6 +456,56 @@ private void fillVector(byte[] pageData, 
CarbonColumnVector vector, DataType vec
   DecimalConverterFactory.DecimalConverter decimalConverter = 
vectorInfo.decimalConverter;
   decimalConverter.fillVector(pageData, pageSize, vectorInfo, 
nullBits, pageDataType);
 }
+  } else if (pageDataType == DataTypes.BYTE_ARRAY) {
+if (vectorDataType == DataTypes.STRING || vectorDataType == 
DataTypes.BINARY
+|| vectorDataType == DataTypes.VARCHAR) {
+  // for complex primitive string, binary, varchar type
+  int offset = 0;
+  for (int i = 0; i < pageSize; i++) {
+byte[] stringLen = new byte[DataTypes.INT.getSizeInBytes()];

Review comment:
   this byte array is not required, wrap page data inside byte buffer and 
updates  the position to get the length of the data
   By this we can avoid creating multiple bytebuffer and byte array . Pls 
handle all the places 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-26 Thread GitBox


kumarvishal09 commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r477308991



##
File path: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java
##
@@ -376,6 +456,56 @@ private void fillVector(byte[] pageData, 
CarbonColumnVector vector, DataType vec
   DecimalConverterFactory.DecimalConverter decimalConverter = 
vectorInfo.decimalConverter;
   decimalConverter.fillVector(pageData, pageSize, vectorInfo, 
nullBits, pageDataType);
 }
+  } else if (pageDataType == DataTypes.BYTE_ARRAY) {
+if (vectorDataType == DataTypes.STRING || vectorDataType == 
DataTypes.BINARY
+|| vectorDataType == DataTypes.VARCHAR) {
+  // for complex primitive string, binary, varchar type
+  int offset = 0;
+  for (int i = 0; i < pageSize; i++) {
+byte[] stringLen = new byte[DataTypes.INT.getSizeInBytes()];

Review comment:
   this byte array is not required, wrap page data inside byte buffer and 
updates  the position to get the length of the data
   By this we can avoid creating multiple bytebuffer and byte array 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-26 Thread GitBox


kumarvishal09 commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r477292131



##
File path: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveCodec.java
##
@@ -260,4 +265,65 @@ protected String debugInfo() {
 return this.toString();
   }
 
+  public static VectorUtil checkAndUpdateToChildVector(ColumnVectorInfo 
vectorInfo, int pageSize,
+  CarbonColumnVector vector, DataType vectorDataType) {
+VectorUtil vectorUtil = new VectorUtil(pageSize, vector, vectorDataType);
+Stack vectorStack = vectorInfo.getVectorStack();
+// check and update to child vector info
+if (vectorStack != null && vectorStack.peek() != null && 
vectorDataType.isComplexType()) {
+  if (DataTypes.isArrayType(vectorDataType)) {
+List childElementsCountForEachRow =
+((CarbonColumnVectorImpl) vector.getColumnVector())
+.getNumberOfChildrenElementsInEachRow();
+int newPageSize = 0;
+for (int childElementsCount : childElementsCountForEachRow) {
+  newPageSize += childElementsCount;
+}
+vectorUtil.setPageSize(newPageSize);
+  }
+  // child vector flow, so fill the child vector
+  CarbonColumnVector childVector = vectorStack.pop();
+  vectorUtil.setVector(childVector);
+  vectorUtil.setVectorDataType(childVector.getType());
+}
+return vectorUtil;
+  }
+
+  // Utility class to update current vector to child vector in case of complex 
type handling
+  public static class VectorUtil {
+private int pageSize;
+private CarbonColumnVector vector;
+private DataType vectorDataType;
+
+private VectorUtil(int pageSize, CarbonColumnVector vector, DataType 
vectorDataType) {
+  this.pageSize = pageSize;
+  this.vector = vector;
+  this.vectorDataType = vectorDataType;
+}
+
+public int getPageSize() {

Review comment:
   we can use org.apache.commons.lang3.tuple class for returning 3 args 
from one method , pls check if you can use here 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-26 Thread GitBox


kumarvishal09 commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r477289221



##
File path: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveCodec.java
##
@@ -260,4 +265,65 @@ protected String debugInfo() {
 return this.toString();
   }
 
+  public static VectorUtil checkAndUpdateToChildVector(ColumnVectorInfo 
vectorInfo, int pageSize,
+  CarbonColumnVector vector, DataType vectorDataType) {
+VectorUtil vectorUtil = new VectorUtil(pageSize, vector, vectorDataType);
+Stack vectorStack = vectorInfo.getVectorStack();
+// check and update to child vector info
+if (vectorStack != null && vectorStack.peek() != null && 
vectorDataType.isComplexType()) {

Review comment:
   use Objects.nonNull for not null case and for null use Objects.isnull





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-26 Thread GitBox


kumarvishal09 commented on a change in pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#discussion_r477289221



##
File path: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveCodec.java
##
@@ -260,4 +265,65 @@ protected String debugInfo() {
 return this.toString();
   }
 
+  public static VectorUtil checkAndUpdateToChildVector(ColumnVectorInfo 
vectorInfo, int pageSize,
+  CarbonColumnVector vector, DataType vectorDataType) {
+VectorUtil vectorUtil = new VectorUtil(pageSize, vector, vectorDataType);
+Stack vectorStack = vectorInfo.getVectorStack();
+// check and update to child vector info
+if (vectorStack != null && vectorStack.peek() != null && 
vectorDataType.isComplexType()) {

Review comment:
   use Objects.nonNull for not null case and for null use Objects.isnull, 
pls handle in all applicable places 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3875: [WIP]Presto write transactional

2020-08-26 Thread GitBox


CarbonDataQA1 commented on pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#issuecomment-680843631


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3880/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3875: [WIP]Presto write transactional

2020-08-26 Thread GitBox


CarbonDataQA1 commented on pull request #3875:
URL: https://github.com/apache/carbondata/pull/3875#issuecomment-680843243


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2139/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3903: [CARBONDATA-3963]Fix hive timestamp data mismatch issue and empty data during query issue

2020-08-26 Thread GitBox


CarbonDataQA1 commented on pull request #3903:
URL: https://github.com/apache/carbondata/pull/3903#issuecomment-680840597


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2138/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3903: [CARBONDATA-3963]Fix hive timestamp data mismatch issue and empty data during query issue

2020-08-26 Thread GitBox


CarbonDataQA1 commented on pull request #3903:
URL: https://github.com/apache/carbondata/pull/3903#issuecomment-680833851


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3879/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3963) timestamp data is wrong in case of reading carbon via hive and other issue

2020-08-26 Thread Akash R Nilugal (Jira)
Akash R Nilugal created CARBONDATA-3963:
---

 Summary: timestamp data is wrong in case of reading carbon via 
hive and other issue
 Key: CARBONDATA-3963
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3963
 Project: CarbonData
  Issue Type: Bug
Reporter: Akash R Nilugal
Assignee: Akash R Nilugal


1. timestamp data is wrong when carbon table is read via hive
2. carbon is not giving any data in beeline when queries via hive



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3962) Empty fact dirs are present in case of flat folder, which are unnecessary

2020-08-26 Thread Akash R Nilugal (Jira)
Akash R Nilugal created CARBONDATA-3962:
---

 Summary: Empty fact dirs are present in case of flat folder, which 
are unnecessary
 Key: CARBONDATA-3962
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3962
 Project: CarbonData
  Issue Type: Bug
Reporter: Akash R Nilugal
Assignee: Akash R Nilugal


Empty fact dirs are present in case of flat folder, which are unnecessary



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] akashrn5 opened a new pull request #3904: []Remove unwanted empty fact directory in case of flat_folder table

2020-08-26 Thread GitBox


akashrn5 opened a new pull request #3904:
URL: https://github.com/apache/carbondata/pull/3904


### Why is this PR needed?
In case of flat folder, we write the data files directly at table path, so 
fact dir is not required. Fact dir is unwanted and present as empty dir.

### What changes were proposed in this PR?
   Remove empty fact dirs
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3900: [WIP] Test

2020-08-26 Thread GitBox


CarbonDataQA1 commented on pull request #3900:
URL: https://github.com/apache/carbondata/pull/3900#issuecomment-680828750


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2137/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3900: [WIP] Test

2020-08-26 Thread GitBox


CarbonDataQA1 commented on pull request #3900:
URL: https://github.com/apache/carbondata/pull/3900#issuecomment-680827562


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3878/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-08-26 Thread GitBox


CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-680807986


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3876/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] asfgit closed pull request #3899: [HOTFIX] Aviod cleaning segments after reading tablestatus failed

2020-08-26 Thread GitBox


asfgit closed pull request #3899:
URL: https://github.com/apache/carbondata/pull/3899


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3899: [HOTFIX] Aviod cleaning segments after reading tablestatus failed

2020-08-26 Thread GitBox


kunal642 commented on a change in pull request #3899:
URL: https://github.com/apache/carbondata/pull/3899#discussion_r477201000



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
##
@@ -310,7 +310,8 @@ class TestLoadDataGeneral extends QueryTest with 
BeforeAndAfterEach {
 val tableStatusFile = 
CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath)
 FileFactory.getCarbonFile(tableStatusFile).delete()
 sql("insert into stale values('k')")
-checkAnswer(sql("select * from stale"), Row("k"))
+// if table lose tablestatus file, the system should keep all data.
+checkAnswer(sql("select * from stale"), Seq(Row("k"), Row("k")))

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #3899: [HOTFIX] Aviod cleaning segments after reading tablestatus failed

2020-08-26 Thread GitBox


kunal642 commented on pull request #3899:
URL: https://github.com/apache/carbondata/pull/3899#issuecomment-680798546


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-08-26 Thread GitBox


CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-680791844


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2135/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #3899: [HOTFIX] Aviod cleaning segments after reading tablestatus failed

2020-08-26 Thread GitBox


QiangCai commented on a change in pull request #3899:
URL: https://github.com/apache/carbondata/pull/3899#discussion_r477191769



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
##
@@ -310,7 +310,8 @@ class TestLoadDataGeneral extends QueryTest with 
BeforeAndAfterEach {
 val tableStatusFile = 
CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath)
 FileFactory.getCarbonFile(tableStatusFile).delete()
 sql("insert into stale values('k')")
-checkAnswer(sql("select * from stale"), Row("k"))
+// if table lose tablestatus file, the system should keep all data.
+checkAnswer(sql("select * from stale"), Seq(Row("k"), Row("k")))

Review comment:
   for this test case, the result is two rows,  the behavior is right.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-26 Thread GitBox


Indhumathi27 commented on pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#issuecomment-680785361


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #3899: [HOTFIX] Aviod cleaning segments after reading tablestatus failed

2020-08-26 Thread GitBox


QiangCai commented on a change in pull request #3899:
URL: https://github.com/apache/carbondata/pull/3899#discussion_r477183265



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
##
@@ -310,7 +310,8 @@ class TestLoadDataGeneral extends QueryTest with 
BeforeAndAfterEach {
 val tableStatusFile = 
CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath)
 FileFactory.getCarbonFile(tableStatusFile).delete()
 sql("insert into stale values('k')")
-checkAnswer(sql("select * from stale"), Row("k"))
+// if table lose tablestatus file, the system should keep all data.
+checkAnswer(sql("select * from stale"), Seq(Row("k"), Row("k")))

Review comment:
   yes
   1. now, after removing the tablestatus file,  we can consider all segments 
are successful by default.
   2. in the future, we can add a flag file into the segment when we mark the 
segment to be deleted.
  the flag file contains a message the segment is mark_for_delete.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-26 Thread GitBox


akashrn5 commented on pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#issuecomment-680780532


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3899: [HOTFIX] Aviod cleaning segments after reading tablestatus failed

2020-08-26 Thread GitBox


kunal642 commented on a change in pull request #3899:
URL: https://github.com/apache/carbondata/pull/3899#discussion_r477173788



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
##
@@ -310,7 +310,8 @@ class TestLoadDataGeneral extends QueryTest with 
BeforeAndAfterEach {
 val tableStatusFile = 
CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath)
 FileFactory.getCarbonFile(tableStatusFile).delete()
 sql("insert into stale values('k')")
-checkAnswer(sql("select * from stale"), Row("k"))
+// if table lose tablestatus file, the system should keep all data.
+checkAnswer(sql("select * from stale"), Seq(Row("k"), Row("k")))

Review comment:
   I think this behaviour would be harrmful when the segment is marked for 
delete..If the segment is not removed then the results would be incorrect.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

2020-08-26 Thread GitBox


CarbonDataQA1 commented on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-680773026


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2134/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

2020-08-26 Thread GitBox


CarbonDataQA1 commented on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-680771095


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3875/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Zhangshunyu commented on pull request #3899: [HOTFIX] Aviod cleaning segments after reading tablestatus failed

2020-08-26 Thread GitBox


Zhangshunyu commented on pull request #3899:
URL: https://github.com/apache/carbondata/pull/3899#issuecomment-680768668


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 opened a new pull request #3903: [wip]Fix hive timestamp data mismatch issue and empty data during query issue

2020-08-26 Thread GitBox


akashrn5 opened a new pull request #3903:
URL: https://github.com/apache/carbondata/pull/3903


### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3900: [WIP] Test

2020-08-26 Thread GitBox


CarbonDataQA1 commented on pull request #3900:
URL: https://github.com/apache/carbondata/pull/3900#issuecomment-680760581


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2133/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3900: [WIP] Test

2020-08-26 Thread GitBox


CarbonDataQA1 commented on pull request #3900:
URL: https://github.com/apache/carbondata/pull/3900#issuecomment-680758465


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3874/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-26 Thread GitBox


CarbonDataQA1 commented on pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#issuecomment-680755507


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3873/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-26 Thread GitBox


CarbonDataQA1 commented on pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#issuecomment-680755157


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2132/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3899: [HOTFIX] Aviod cleaning segments after reading tablestatus failed

2020-08-26 Thread GitBox


CarbonDataQA1 commented on pull request #3899:
URL: https://github.com/apache/carbondata/pull/3899#issuecomment-680755166


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2131/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3899: [HOTFIX] Aviod cleaning segments after reading tablestatus failed

2020-08-26 Thread GitBox


CarbonDataQA1 commented on pull request #3899:
URL: https://github.com/apache/carbondata/pull/3899#issuecomment-680754700


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3872/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 opened a new pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-08-26 Thread GitBox


kunal642 opened a new pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902


### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3961) Reorder filter according to the column storage ordinal to improve reading

2020-08-26 Thread Kunal Kapoor (Jira)
Kunal Kapoor created CARBONDATA-3961:


 Summary: Reorder filter according to the column storage ordinal to 
improve reading
 Key: CARBONDATA-3961
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3961
 Project: CarbonData
  Issue Type: Bug
Reporter: Kunal Kapoor
Assignee: Kunal Kapoor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3856: [CARBONDATA-3929]Improve CDC performance

2020-08-26 Thread GitBox


ajantha-bhat commented on a change in pull request #3856:
URL: https://github.com/apache/carbondata/pull/3856#discussion_r477095488



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/merge/CarbonMergeDataSetCommand.scala
##
@@ -194,29 +210,32 @@ case class CarbonMergeDataSetCommand(
 tuple._2.asJava)
 }
   }
-  Some(UpdateTableModel(true, trxMgr.getLatestTrx,
-executorErrors, tuple._2, true))
+  Some(UpdateTableModel(isUpdate = true, trxMgr.getLatestTrx,
+executorErrors, tuple._2, loadAsNewSegment = true))
 } else {
   None
 }
 
-CarbonInsertIntoWithDf(
-  databaseNameOp = Some(carbonTable.getDatabaseName),
+val dataFrame = loadDF.select(tableCols.map(col): _*)
+CarbonInsertIntoCommand(databaseNameOp = Some(carbonTable.getDatabaseName),
   tableName = carbonTable.getTableName,
-  options = Map("fileheader" -> header, "sort_scope" -> "nosort"),
+  options = Map("fileheader" -> header, "sort_scope" -> "no_sort"),

Review comment:
   check this.
   https://github.com/apache/carbondata/pull/3901





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

2020-08-26 Thread GitBox


ajantha-bhat commented on pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901#issuecomment-680714175


   @QiangCai , @ravipesala @marchpure @akashrn5 : please check this



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3856: [CARBONDATA-3929]Improve CDC performance

2020-08-26 Thread GitBox


ajantha-bhat commented on a change in pull request #3856:
URL: https://github.com/apache/carbondata/pull/3856#discussion_r477095415



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/merge/CarbonMergeDataSetCommand.scala
##
@@ -194,29 +210,32 @@ case class CarbonMergeDataSetCommand(
 tuple._2.asJava)
 }
   }
-  Some(UpdateTableModel(true, trxMgr.getLatestTrx,
-executorErrors, tuple._2, true))
+  Some(UpdateTableModel(isUpdate = true, trxMgr.getLatestTrx,
+executorErrors, tuple._2, loadAsNewSegment = true))
 } else {
   None
 }
 
-CarbonInsertIntoWithDf(
-  databaseNameOp = Some(carbonTable.getDatabaseName),
+val dataFrame = loadDF.select(tableCols.map(col): _*)
+CarbonInsertIntoCommand(databaseNameOp = Some(carbonTable.getDatabaseName),
   tableName = carbonTable.getTableName,
-  options = Map("fileheader" -> header, "sort_scope" -> "nosort"),
+  options = Map("fileheader" -> header, "sort_scope" -> "no_sort"),

Review comment:
   @akashrn5 : instead of fixing no_sort, would have analyzed why it was 
added. Originally it was using Target table sort scope only. #3764 added this. 
would have removed it here.
   Now that you changed to no_sort, target table sort_scope is not used.
   
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat opened a new pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope

2020-08-26 Thread GitBox


ajantha-bhat opened a new pull request #3901:
URL: https://github.com/apache/carbondata/pull/3901


### Why is this PR needed?
#3764 has added nosort (this is wrong code, but no functional impact as it 
was not changing new segment load to no_sort)
#3856 has changed it to no_sort (creates a functional impact by changing 
target table new segment to use to no_sort)

### What changes were proposed in this PR?
   CDC update as new segment should use target table sort_scope
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No (verified manually the flows)
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] dependabot[bot] commented on pull request #3447: Bump dep.jackson.version from 2.6.5 to 2.10.1 in /store/sdk

2020-08-26 Thread GitBox


dependabot[bot] commented on pull request #3447:
URL: https://github.com/apache/carbondata/pull/3447#issuecomment-680710899


   Dependabot tried to update this pull request, but something went wrong. 
We're looking into it, but in the meantime you can retry the update by 
commenting `@dependabot rebase`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] dependabot[bot] commented on pull request #3456: Bump solr.version from 6.3.0 to 8.3.0 in /datamap/lucene

2020-08-26 Thread GitBox


dependabot[bot] commented on pull request #3456:
URL: https://github.com/apache/carbondata/pull/3456#issuecomment-680710764


   Dependabot tried to update this pull request, but something went wrong. 
We're looking into it, but in the meantime you can retry the update by 
commenting `@dependabot rebase`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] asfgit closed pull request #3856: [CARBONDATA-3929]Improve CDC performance

2020-08-26 Thread GitBox


asfgit closed pull request #3856:
URL: https://github.com/apache/carbondata/pull/3856


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-26 Thread GitBox


CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-680702383


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2130/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on pull request #3899: [WIP] Aviod cleaning segments after reading tablestatus failed

2020-08-26 Thread GitBox


QiangCai commented on pull request #3899:
URL: https://github.com/apache/carbondata/pull/3899#issuecomment-680701522


   @kunal642 please check this modification.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-26 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r477080590



##
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##
@@ -444,9 +446,38 @@ private static Object parseTimestamp(String 
dimensionValue, String dateFormat) {
 dateFormatter = timestampFormatter.get();
   }
   dateToStr = dateFormatter.parse(dimensionValue);
-  return dateToStr.getTime();
+  timeValue = dateToStr.getTime();
+  validateTimeStampRange(timeValue);
+  return timeValue;
 } catch (ParseException e) {
-  throw new NumberFormatException(e.getMessage());
+  // If the parsing fails, try to parse again with setLenient to true if 
the property is set
+  if (CarbonProperties.getInstance().isSetLenientEnabled()) {
+try {
+  dateFormatter.setLenient(true);
+  dateToStr = dateFormatter.parse(dimensionValue);
+  dateFormatter.setLenient(false);
+  timeValue = dateToStr.getTime();
+  validateTimeStampRange(timeValue);
+  LOGGER.info("Parsed data with lenience as true, setting back to 
default mode");
+  return timeValue;
+} catch (ParseException ex) {
+  dateFormatter.setLenient(false);

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-26 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r477080281



##
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##
@@ -444,9 +446,38 @@ private static Object parseTimestamp(String 
dimensionValue, String dateFormat) {
 dateFormatter = timestampFormatter.get();
   }
   dateToStr = dateFormatter.parse(dimensionValue);
-  return dateToStr.getTime();
+  timeValue = dateToStr.getTime();
+  validateTimeStampRange(timeValue);
+  return timeValue;
 } catch (ParseException e) {
-  throw new NumberFormatException(e.getMessage());
+  // If the parsing fails, try to parse again with setLenient to true if 
the property is set
+  if (CarbonProperties.getInstance().isSetLenientEnabled()) {
+try {
+  dateFormatter.setLenient(true);

Review comment:
   added an example.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org