[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478195952 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ## @@ -823,6 +829,31 @@ public void write(Object object) throws IOException { } } + /** + * Load data of all avro files at given location iteratively. + * + * @throws IOException + */ + @Override + public void write() throws IOException { +if (this.dataFiles == null || this.dataFiles.length == 0) { + throw new RuntimeException("'withAvroPath()' must be called to support loading avro files"); +} +Arrays.sort(this.dataFiles, Comparator.comparing(CarbonFile::getPath)); Review comment: Is this sort required ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r478194800 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/AvroCarbonWriter.java ## @@ -25,17 +25,12 @@ import java.math.BigDecimal; import java.math.BigInteger; import java.nio.ByteBuffer; -import java.util.ArrayList; -import java.util.HashMap; -import java.util.Iterator; -import java.util.List; -import java.util.Map; -import java.util.Random; -import java.util.UUID; +import java.util.*; Review comment: Please do not use wildcard import. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-681629618 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2142/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
VenuReddy2103 commented on a change in pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#discussion_r472407770 ## File path: sdk/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java ## @@ -594,6 +607,446 @@ public CarbonWriterBuilder withJsonInput(Schema carbonSchema) { return this; } + private void validateCsvFiles() throws IOException { +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.CSV_FILE_EXTENSION); +if (CollectionUtils.isEmpty(Arrays.asList(dataFiles))) { + throw new RuntimeException("CSV files can't be empty."); +} +for (CarbonFile dataFile : dataFiles) { + try { +CsvParser csvParser = SDKUtil.buildCsvParser(this.hadoopConf); + csvParser.beginParsing(FileFactory.getDataInputStream(dataFile.getPath(), +-1, this.hadoopConf)); + } catch (IllegalArgumentException ex) { +if (ex.getCause() instanceof FileNotFoundException) { + throw new FileNotFoundException("File " + dataFile + + " not found to build carbon writer."); +} +throw ex; + } +} +this.dataFiles = dataFiles; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading CSV files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withCsvInput(); +this.validateCsvFiles(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts CSV files directory and + * list of file which has to be loaded. + * + * @param filePath directory where the CSV file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withCsvPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withCsvPath(filePath); +return this; + } + + private void validateJsonFiles() throws IOException { +CarbonFile[] dataFiles = this.extractDataFiles(CarbonCommonConstants.JSON_FILE_EXTENSION); +for (CarbonFile dataFile : dataFiles) { + try { +new JSONParser().parse(SDKUtil.buildJsonReader(dataFile, this.hadoopConf)); + } catch (FileNotFoundException ex) { +throw new FileNotFoundException("File " + dataFile + " not found to build carbon writer."); + } catch (ParseException ex) { +throw new RuntimeException("File " + dataFile + " is not in json format."); + } +} +this.dataFiles = dataFiles; + } + + /** + * to build a {@link CarbonWriter}, which accepts loading JSON files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withJsonPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.withJsonInput(); +this.validateJsonFiles(); +return this; + } + + /** + * to build a {@link CarbonWriter}, which accepts JSON file directory and + * list of file which has to be loaded. + * + * @param filePath directory where the json file exists. + * @param fileList list of files which has to be loaded. + * @return CarbonWriterBuilder + * @throws IOException + */ + public CarbonWriterBuilder withJsonPath(String filePath, List fileList) + throws IOException { +this.fileList = fileList; +this.withJsonPath(filePath); +return this; + } + + private void validateFilePath(String filePath) { +if (StringUtils.isEmpty(filePath)) { + throw new IllegalArgumentException("filePath can not be empty"); +} + } + + /** + * to build a {@link CarbonWriter}, which accepts loading Parquet files. + * + * @param filePath absolute path under which files should be loaded. + * @return CarbonWriterBuilder + */ + public CarbonWriterBuilder withParquetPath(String filePath) throws IOException { +this.validateFilePath(filePath); +this.filePath = filePath; +this.setIsDirectory(filePath); +this.writerType = WRITER_TYPE.PARQUET; +this.validateParquetFiles(); +return this; + } + + private void setIsDirectory(String filePath) { +if (this.hadoopConf == null) { + this.hadoopConf = new Configuration(FileFactory.getConfiguration()); Review comment: Had checked the base code. In the base code, we seem to directly assign the return value of FileFactory.getConfiguration() instead of new Configuration. Suggest to check and keep it consistent. Check for all the places in this PR. This is an automated message from the Apache Git Service. To respond t
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-681626492 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3883/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 closed pull request #3900: [WIP] Test
Indhumathi27 closed pull request #3900: URL: https://github.com/apache/carbondata/pull/3900 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
VenuReddy2103 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-681588563 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
VenuReddy2103 commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r478153107 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala ## @@ -306,7 +315,51 @@ class TestLoadDataWithDiffTimestampFormat extends QueryTest with BeforeAndAfterA } } + test("test load, update data with setlenient carbon property for daylight " + + "saving time from different timezone") { + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE, "true") +TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai")) +sql("DROP TABLE IF EXISTS test_time") +sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time Timestamp) STORED AS carbondata " + +"TBLPROPERTIES('dateformat'='-MM-dd', 'timestampformat'='-MM-dd HH:mm:ss') ") +sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time") +sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ") +sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'") +checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00" +checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00" +checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00" +sql("DROP TABLE test_time") + CarbonProperties.getInstance().removeProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE) + } + + test("test load, update data with setlenient session level property for daylight " + + "saving time from different timezone") { +sql("set carbon.load.dateformat.setlenient.enable = true") +TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai")) +sql("DROP TABLE IF EXISTS test_time") +sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time Timestamp) STORED AS carbondata " + +"TBLPROPERTIES('dateformat'='-MM-dd', 'timestampformat'='-MM-dd HH:mm:ss') ") +sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time") +sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ") +sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'") +checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00" +checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00" +checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00" +sql("DROP TABLE test_time") +defaultConfig() + } + + def generateCSVFile(): Unit = { +val rows = new ListBuffer[Array[String]] +rows += Array("ID", "date", "time") +rows += Array("1", "1941-3-15", "1941-3-15 00:00:00") +rows += Array("2", "2016-7-24", "2016-7-24 01:02:30") +BadRecordUtil.createCSV(rows, csvPath) + } + override def afterAll { sql("DROP TABLE IF EXISTS t3") +FileUtils.forceDelete(new File(csvPath)) +TimeZone.setDefault(defaultTimeZone) Review comment: `afterAll()` is called only once per testcase file. But, have to set back`TimeZone.setDefault(defaultTimeZone)` at the end of the testcase where you changed zone to China/Shanghai. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
VenuReddy2103 commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r478153107 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala ## @@ -306,7 +315,51 @@ class TestLoadDataWithDiffTimestampFormat extends QueryTest with BeforeAndAfterA } } + test("test load, update data with setlenient carbon property for daylight " + + "saving time from different timezone") { + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE, "true") +TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai")) +sql("DROP TABLE IF EXISTS test_time") +sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time Timestamp) STORED AS carbondata " + +"TBLPROPERTIES('dateformat'='-MM-dd', 'timestampformat'='-MM-dd HH:mm:ss') ") +sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time") +sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ") +sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'") +checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00" +checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00" +checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00" +sql("DROP TABLE test_time") + CarbonProperties.getInstance().removeProperty(CarbonCommonConstants.CARBON_LOAD_DATEFORMAT_SETLENIENT_ENABLE) + } + + test("test load, update data with setlenient session level property for daylight " + + "saving time from different timezone") { +sql("set carbon.load.dateformat.setlenient.enable = true") +TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai")) +sql("DROP TABLE IF EXISTS test_time") +sql("CREATE TABLE IF NOT EXISTS test_time (ID Int, date Date, time Timestamp) STORED AS carbondata " + +"TBLPROPERTIES('dateformat'='-MM-dd', 'timestampformat'='-MM-dd HH:mm:ss') ") +sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time") +sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ") +sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'") +checkAnswer(sql("SELECT time FROM test_time WHERE ID = 1"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00" +checkAnswer(sql("SELECT time FROM test_time WHERE ID = 11"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00" +checkAnswer(sql("SELECT time FROM test_time WHERE ID = 2"), Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00" +sql("DROP TABLE test_time") +defaultConfig() + } + + def generateCSVFile(): Unit = { +val rows = new ListBuffer[Array[String]] +rows += Array("ID", "date", "time") +rows += Array("1", "1941-3-15", "1941-3-15 00:00:00") +rows += Array("2", "2016-7-24", "2016-7-24 01:02:30") +BadRecordUtil.createCSV(rows, csvPath) + } + override def afterAll { sql("DROP TABLE IF EXISTS t3") +FileUtils.forceDelete(new File(csvPath)) +TimeZone.setDefault(defaultTimeZone) Review comment: `afterAll()` is called only once per testcase file. But, have to set back`TimeZone.setDefault(defaultTimeZone)` at the end of the testcase where you changed zone to China/Shanghai. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3893: [CARBONDATA-3959] Added new property to set the value of executor LRU cache size to 70% of the total executor memory in Inde
kunal642 commented on a change in pull request #3893: URL: https://github.com/apache/carbondata/pull/3893#discussion_r477384526 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -1280,6 +1280,11 @@ private CarbonCommonConstants() { public static final String CARBON_MAX_EXECUTOR_LRU_CACHE_SIZE = "carbon.max.executor.lru.cache.size"; + /** + * when executor LRU cache is not configured, set it to 70% percent of executor memory size + */ + public static final double CARBON_DEFAULT_EXECUTOR_LRU_CACHE_PERCENT = 0.7d; Review comment: Please expose a property so that the user can set the % value that is required as the LRU cache size. And make sure that the user should not be able to set it to invalid values like "negative values, 0(0%), 1(100%) etc. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3893: [CARBONDATA-3959] Added new property to set the value of executor LRU cache size to 70% of the total executor memory in Inde
kunal642 commented on a change in pull request #3893: URL: https://github.com/apache/carbondata/pull/3893#discussion_r477382736 ## File path: integration/spark/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala ## @@ -243,11 +243,26 @@ object IndexServer extends ServerInterface { def main(args: Array[String]): Unit = { if (serverIp.isEmpty) { throw new RuntimeException(s"Please set the server IP to use Index Cache Server") -} else if (!isExecutorLRUConfigured) { - throw new RuntimeException(s"Executor LRU cache size is not set. Please set using " + - s"${ CarbonCommonConstants.CARBON_MAX_EXECUTOR_LRU_CACHE_SIZE }") } else { createCarbonSession() + if (!isExecutorLRUConfigured) { Review comment: This is the wrong place to set the executor memory based on the %!!! If you set here then only the driver knows about the CARBON_MAX_EXECUTOR_LRU_CACHE_SIZE. Executor would still use the default value. Please move this to CacheProvider class. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3894: [CARBONDATA-3957] Added property to enable disable the repair logic and provide a limit to number of segments in SILoadEvent
kunal642 commented on a change in pull request #3894: URL: https://github.com/apache/carbondata/pull/3894#discussion_r477356848 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/events/SILoadEventListenerForFailedSegments.scala ## @@ -57,189 +58,201 @@ class SILoadEventListenerForFailedSegments extends OperationEventListener with L event match { case postStatusUpdateEvent: LoadTablePostStatusUpdateEvent => LOGGER.info("Load post status update event-listener called") -val loadTablePostStatusUpdateEvent = event.asInstanceOf[LoadTablePostStatusUpdateEvent] -val carbonLoadModel = loadTablePostStatusUpdateEvent.getCarbonLoadModel -val sparkSession = SparkSession.getActiveSession.get -// when Si creation and load to main table are parallel, get the carbonTable from the -// metastore which will have the latest index Info -val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore -val carbonTable = metaStore - .lookupRelation(Some(carbonLoadModel.getDatabaseName), - carbonLoadModel.getTableName)(sparkSession).asInstanceOf[CarbonRelation].carbonTable -val indexMetadata = carbonTable.getIndexMetadata -val secondaryIndexProvider = IndexType.SI.getIndexProviderName -if (null != indexMetadata && null != indexMetadata.getIndexesMap && +if (CarbonProperties.getInstance().isSIRepairEnabled) { + val loadTablePostStatusUpdateEvent = event.asInstanceOf[LoadTablePostStatusUpdateEvent] + val carbonLoadModel = loadTablePostStatusUpdateEvent.getCarbonLoadModel + val sparkSession = SparkSession.getActiveSession.get + // when Si creation and load to main table are parallel, get the carbonTable from the + // metastore which will have the latest index Info + val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore + val carbonTable = metaStore +.lookupRelation(Some(carbonLoadModel.getDatabaseName), + carbonLoadModel.getTableName)(sparkSession).asInstanceOf[CarbonRelation].carbonTable + val indexMetadata = carbonTable.getIndexMetadata + val secondaryIndexProvider = IndexType.SI.getIndexProviderName + if (null != indexMetadata && null != indexMetadata.getIndexesMap && null != indexMetadata.getIndexesMap.get(secondaryIndexProvider)) { - val indexTables = indexMetadata.getIndexesMap -.get(secondaryIndexProvider).keySet().asScala - // if there are no index tables for a given fact table do not perform any action - if (indexTables.nonEmpty) { -val mainTableDetails = - SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath) -indexTables.foreach { - indexTableName => -val isLoadSIForFailedSegments = sparkSession.sessionState.catalog - .getTableMetadata(TableIdentifier(indexTableName, -Some(carbonLoadModel.getDatabaseName))).storage.properties - .getOrElse("isSITableEnabled", "true").toBoolean -val indexTable = metaStore - .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)( -sparkSession) - .asInstanceOf[CarbonRelation] - .carbonTable +val indexTables = indexMetadata.getIndexesMap + .get(secondaryIndexProvider).keySet().asScala +// if there are no index tables for a given fact table do not perform any action +if (indexTables.nonEmpty) { + val mainTableDetails = + SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath) + indexTables.foreach { +indexTableName => + val isLoadSIForFailedSegments = sparkSession.sessionState.catalog +.getTableMetadata(TableIdentifier(indexTableName, + Some(carbonLoadModel.getDatabaseName))).storage.properties +.getOrElse("isSITableEnabled", "true").toBoolean + val indexTable = metaStore +.lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)( + sparkSession) +.asInstanceOf[CarbonRelation] +.carbonTable -val mainTblLoadMetadataDetails: Array[LoadMetadataDetails] = - SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath) -val siTblLoadMetadataDetails: Array[LoadMetadataDetails] = - SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath) -var segmentLocks: ListBuffer[ICarbonLock] = ListBuffer.empty -if (!isLoadSIForFailedSegments + val mainTblLoadMetadataDetails
[GitHub] [carbondata] kunal642 commented on a change in pull request #3894: [CARBONDATA-3957] Added property to enable disable the repair logic and provide a limit to number of segments in SILoadEvent
kunal642 commented on a change in pull request #3894: URL: https://github.com/apache/carbondata/pull/3894#discussion_r475533562 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -2262,6 +2262,32 @@ private CarbonCommonConstants() { public static final int CARBON_INDEX_SERVER_WORKER_THREADS_DEFAULT = 500; + /** + * Configured property to enable/disable load failed segments in SI table during + * load/insert command. + */ + @CarbonProperty(dynamicConfigurable = true) + public static final String CARBON_LOAD_SI_REPAIR = "carbon.load.si.repair"; + + /** + * Default value for load failed segments in SI table during + * load/insert command. + */ + public static final String CARBON_LOAD_SI_REPAIR_DEFAULT = "true"; + + /** + * Property to give a limit to the number of segments that are reloaded in the + * SI table in the FailedSegments listener. + */ + @CarbonProperty + public static final String CARBON_SI_REPAIR_LIMIT = "carbon.si.repair.limit"; Review comment: 1. I think its better to have these 2 properties at table level instead of global. 2. If this is a dynamic conf then add (dynamicConfiguration = true) as the annotation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
kunal642 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r477352862 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala ## @@ -377,4 +381,212 @@ object CarbonIndexUtil { AlterTableUtil.releaseLocks(locks.asScala.toList) } } + + def processSIRepair(indexTableName: String, carbonTable: CarbonTable, +carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata, + mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: String) + (sparkSession: SparkSession) : Unit = { +val sparkSession = SparkSession.getActiveSession.get +// val databaseName = sparkSession.catalog.currentDatabase +// when Si creation and load to main table are parallel, get the carbonTable from the +// metastore which will have the latest index Info +val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore +val indexTable = metaStore + .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)( +sparkSession) + .asInstanceOf[CarbonRelation] + .carbonTable + +val mainTblLoadMetadataDetails: Array[LoadMetadataDetails] = + SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath) +val siTblLoadMetadataDetails: Array[LoadMetadataDetails] = + SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath) +var segmentLocks: ListBuffer[ICarbonLock] = ListBuffer.empty +if (!CarbonInternalLoaderUtil.checkMainTableSegEqualToSISeg( + mainTblLoadMetadataDetails, + siTblLoadMetadataDetails)) { + val indexColumns = indexMetadata.getIndexColumns(secondaryIndexProvider, +indexTableName) + val secondaryIndex = IndexModel(Some(carbonTable.getDatabaseName), +indexMetadata.getParentTableName, +indexColumns.split(",").toList, +indexTableName) + + var details = SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath) + // If it empty, then no need to do further computations because the + // tabletstatus might not have been created and hence next load will take care + if (details.isEmpty) { +Seq.empty + } + + val failedLoadMetadataDetails: java.util.List[LoadMetadataDetails] = new util + .ArrayList[LoadMetadataDetails]() + + // read the details of SI table and get all the failed segments during SI + // creation which are MARKED_FOR_DELETE or invalid INSERT_IN_PROGRESS + details.collect { Review comment: change to foreach This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
kunal642 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r477351673 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala ## @@ -377,4 +381,212 @@ object CarbonIndexUtil { AlterTableUtil.releaseLocks(locks.asScala.toList) } } + + def processSIRepair(indexTableName: String, carbonTable: CarbonTable, +carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata, + mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: String) + (sparkSession: SparkSession) : Unit = { +val sparkSession = SparkSession.getActiveSession.get +// val databaseName = sparkSession.catalog.currentDatabase +// when Si creation and load to main table are parallel, get the carbonTable from the +// metastore which will have the latest index Info +val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore +val indexTable = metaStore + .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)( +sparkSession) + .asInstanceOf[CarbonRelation] + .carbonTable + +val mainTblLoadMetadataDetails: Array[LoadMetadataDetails] = + SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath) +val siTblLoadMetadataDetails: Array[LoadMetadataDetails] = + SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath) +var segmentLocks: ListBuffer[ICarbonLock] = ListBuffer.empty +if (!CarbonInternalLoaderUtil.checkMainTableSegEqualToSISeg( + mainTblLoadMetadataDetails, + siTblLoadMetadataDetails)) { + val indexColumns = indexMetadata.getIndexColumns(secondaryIndexProvider, +indexTableName) + val secondaryIndex = IndexModel(Some(carbonTable.getDatabaseName), +indexMetadata.getParentTableName, +indexColumns.split(",").toList, +indexTableName) + + var details = SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath) Review comment: Why 2nd time read is required for SI? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
kunal642 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r477351380 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala ## @@ -377,4 +381,212 @@ object CarbonIndexUtil { AlterTableUtil.releaseLocks(locks.asScala.toList) } } + + def processSIRepair(indexTableName: String, carbonTable: CarbonTable, +carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata, + mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: String) + (sparkSession: SparkSession) : Unit = { +val sparkSession = SparkSession.getActiveSession.get +// val databaseName = sparkSession.catalog.currentDatabase +// when Si creation and load to main table are parallel, get the carbonTable from the +// metastore which will have the latest index Info +val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore +val indexTable = metaStore + .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)( +sparkSession) + .asInstanceOf[CarbonRelation] + .carbonTable + +val mainTblLoadMetadataDetails: Array[LoadMetadataDetails] = + SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath) +val siTblLoadMetadataDetails: Array[LoadMetadataDetails] = + SegmentStatusManager.readLoadMetadata(indexTable.getMetadataPath) +var segmentLocks: ListBuffer[ICarbonLock] = ListBuffer.empty +if (!CarbonInternalLoaderUtil.checkMainTableSegEqualToSISeg( + mainTblLoadMetadataDetails, + siTblLoadMetadataDetails)) { + val indexColumns = indexMetadata.getIndexColumns(secondaryIndexProvider, +indexTableName) + val secondaryIndex = IndexModel(Some(carbonTable.getDatabaseName), Review comment: change variable name to indexModel This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
kunal642 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r477350602 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala ## @@ -377,4 +381,212 @@ object CarbonIndexUtil { AlterTableUtil.releaseLocks(locks.asScala.toList) } } + + def processSIRepair(indexTableName: String, carbonTable: CarbonTable, +carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata, + mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: String) + (sparkSession: SparkSession) : Unit = { +val sparkSession = SparkSession.getActiveSession.get +// val databaseName = sparkSession.catalog.currentDatabase +// when Si creation and load to main table are parallel, get the carbonTable from the +// metastore which will have the latest index Info +val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore +val indexTable = metaStore + .lookupRelation(Some(carbonLoadModel.getDatabaseName), indexTableName)( +sparkSession) + .asInstanceOf[CarbonRelation] + .carbonTable + +val mainTblLoadMetadataDetails: Array[LoadMetadataDetails] = + SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath) Review comment: consider passing the loadMetadata details from the caller to avoid multiple reading. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
kunal642 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r477348891 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala ## @@ -377,4 +381,212 @@ object CarbonIndexUtil { AlterTableUtil.releaseLocks(locks.asScala.toList) } } + + def processSIRepair(indexTableName: String, carbonTable: CarbonTable, +carbonLoadModel: CarbonLoadModel, indexMetadata: IndexMetadata, + mainTableDetails: List[LoadMetadataDetails], secondaryIndexProvider: String) + (sparkSession: SparkSession) : Unit = { +val sparkSession = SparkSession.getActiveSession.get Review comment: please use the sparkSession which is passed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
kunal642 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r477348313 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/IndexRepairCommand.scala ## @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.index + +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.execution.command.DataCommand +import org.apache.spark.sql.hive.CarbonRelation +import org.apache.spark.sql.index.CarbonIndexUtil + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.metadata.index.IndexType +import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, SegmentStatusManager} +import org.apache.carbondata.core.util.path.CarbonTablePath +import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, CarbonLoadModel} + +/** + * Repair logic for reindex command on maintable/indextable + */ +case class IndexRepairCommand(indexnameOp: Option[String], tableIdentifier: TableIdentifier, + dbName: String, + segments: Option[List[String]]) extends DataCommand { + + private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName) + + def processData(sparkSession: SparkSession): Seq[Row] = { +if (dbName == null) { + // dbName is null, repair for index table or all the index table in main table + val databaseName = if (tableIdentifier.database.isEmpty) { +SparkSession.getActiveSession.get.catalog.currentDatabase + } else { +tableIdentifier.database.get + } + triggerRepair(tableIdentifier.table, databaseName, indexnameOp, segments) +} else { + // repairing si for all index tables in the mentioned database in the repair command + sparkSession.sessionState.catalog.listTables(dbName).foreach { +tableIdent => + triggerRepair(tableIdent.table, dbName, indexnameOp, segments) + } +} +Seq.empty + } + + def triggerRepair(tableNameOp: String, databaseName: String, +indexTableToRepair: Option[String], segments: Option[List[String]]): Unit = { +val sparkSession = SparkSession.getActiveSession.get +// when Si creation and load to main table are parallel, get the carbonTable from the +// metastore which will have the latest index Info +val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore +val mainCarbonTable = metaStore + .lookupRelation(Some(databaseName), tableNameOp)(sparkSession) + .asInstanceOf[CarbonRelation].carbonTable + +val carbonLoadModel = new CarbonLoadModel +carbonLoadModel.setDatabaseName(databaseName) +carbonLoadModel.setTableName(tableNameOp) +carbonLoadModel.setTablePath(mainCarbonTable.getTablePath) +val tableStatusFilePath = CarbonTablePath.getTableStatusFilePath(mainCarbonTable.getTablePath) +carbonLoadModel.setLoadMetadataDetails(SegmentStatusManager + .readTableStatusFile(tableStatusFilePath).toList.asJava) +carbonLoadModel.setCarbonDataLoadSchema(new CarbonDataLoadSchema(mainCarbonTable)) + +val indexMetadata = mainCarbonTable.getIndexMetadata +val secondaryIndexProvider = IndexType.SI.getIndexProviderName +if (null != indexMetadata && null != indexMetadata.getIndexesMap && + null != indexMetadata.getIndexesMap.get(secondaryIndexProvider)) { + val indexTables = indexMetadata.getIndexesMap +.get(secondaryIndexProvider).keySet().asScala + // if there are no index tables for a given fact table do not perform any action + if (indexTables.nonEmpty) { +val mainTableDetails = if (segments.isEmpty) { + carbonLoadModel.getLoadMetadataDetails.asScala.toList + // SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath) +} else { + // get segments for main table + carbonLoadModel.getLoadMetadataDetails.asScala.toList.filter( +
[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
kunal642 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r477347873 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/IndexRepairCommand.scala ## @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.index + +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.execution.command.DataCommand +import org.apache.spark.sql.hive.CarbonRelation +import org.apache.spark.sql.index.CarbonIndexUtil + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.metadata.index.IndexType +import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, SegmentStatusManager} +import org.apache.carbondata.core.util.path.CarbonTablePath +import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, CarbonLoadModel} + +/** + * Repair logic for reindex command on maintable/indextable + */ +case class IndexRepairCommand(indexnameOp: Option[String], tableIdentifier: TableIdentifier, + dbName: String, + segments: Option[List[String]]) extends DataCommand { + + private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName) + + def processData(sparkSession: SparkSession): Seq[Row] = { +if (dbName == null) { + // dbName is null, repair for index table or all the index table in main table + val databaseName = if (tableIdentifier.database.isEmpty) { +SparkSession.getActiveSession.get.catalog.currentDatabase + } else { +tableIdentifier.database.get + } + triggerRepair(tableIdentifier.table, databaseName, indexnameOp, segments) +} else { + // repairing si for all index tables in the mentioned database in the repair command + sparkSession.sessionState.catalog.listTables(dbName).foreach { +tableIdent => + triggerRepair(tableIdent.table, dbName, indexnameOp, segments) + } +} +Seq.empty + } + + def triggerRepair(tableNameOp: String, databaseName: String, +indexTableToRepair: Option[String], segments: Option[List[String]]): Unit = { +val sparkSession = SparkSession.getActiveSession.get +// when Si creation and load to main table are parallel, get the carbonTable from the +// metastore which will have the latest index Info +val metaStore = CarbonEnv.getInstance(sparkSession).carbonMetaStore +val mainCarbonTable = metaStore + .lookupRelation(Some(databaseName), tableNameOp)(sparkSession) + .asInstanceOf[CarbonRelation].carbonTable + +val carbonLoadModel = new CarbonLoadModel +carbonLoadModel.setDatabaseName(databaseName) +carbonLoadModel.setTableName(tableNameOp) +carbonLoadModel.setTablePath(mainCarbonTable.getTablePath) +val tableStatusFilePath = CarbonTablePath.getTableStatusFilePath(mainCarbonTable.getTablePath) +carbonLoadModel.setLoadMetadataDetails(SegmentStatusManager + .readTableStatusFile(tableStatusFilePath).toList.asJava) +carbonLoadModel.setCarbonDataLoadSchema(new CarbonDataLoadSchema(mainCarbonTable)) + +val indexMetadata = mainCarbonTable.getIndexMetadata +val secondaryIndexProvider = IndexType.SI.getIndexProviderName +if (null != indexMetadata && null != indexMetadata.getIndexesMap && + null != indexMetadata.getIndexesMap.get(secondaryIndexProvider)) { + val indexTables = indexMetadata.getIndexesMap +.get(secondaryIndexProvider).keySet().asScala + // if there are no index tables for a given fact table do not perform any action + if (indexTables.nonEmpty) { +val mainTableDetails = if (segments.isEmpty) { + carbonLoadModel.getLoadMetadataDetails.asScala.toList + // SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath) +} else { + // get segments for main table + carbonLoadModel.getLoadMetadataDetails.asScala.toList.filter( +
[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
kunal642 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r477343487 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/IndexRepairCommand.scala ## @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.index + +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.execution.command.DataCommand +import org.apache.spark.sql.hive.CarbonRelation +import org.apache.spark.sql.index.CarbonIndexUtil + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.metadata.index.IndexType +import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, SegmentStatusManager} +import org.apache.carbondata.core.util.path.CarbonTablePath +import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, CarbonLoadModel} + +/** + * Repair logic for reindex command on maintable/indextable + */ +case class IndexRepairCommand(indexnameOp: Option[String], tableIdentifier: TableIdentifier, + dbName: String, + segments: Option[List[String]]) extends DataCommand { + + private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName) + + def processData(sparkSession: SparkSession): Seq[Row] = { +if (dbName == null) { + // dbName is null, repair for index table or all the index table in main table + val databaseName = if (tableIdentifier.database.isEmpty) { +SparkSession.getActiveSession.get.catalog.currentDatabase + } else { +tableIdentifier.database.get + } + triggerRepair(tableIdentifier.table, databaseName, indexnameOp, segments) +} else { + // repairing si for all index tables in the mentioned database in the repair command + sparkSession.sessionState.catalog.listTables(dbName).foreach { +tableIdent => + triggerRepair(tableIdent.table, dbName, indexnameOp, segments) + } +} +Seq.empty + } + + def triggerRepair(tableNameOp: String, databaseName: String, +indexTableToRepair: Option[String], segments: Option[List[String]]): Unit = { +val sparkSession = SparkSession.getActiveSession.get Review comment: pass the sparksession from the caller This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
kunal642 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r477343137 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/IndexRepairCommand.scala ## @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.index + +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.execution.command.DataCommand +import org.apache.spark.sql.hive.CarbonRelation +import org.apache.spark.sql.index.CarbonIndexUtil + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.metadata.index.IndexType +import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, SegmentStatusManager} +import org.apache.carbondata.core.util.path.CarbonTablePath +import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, CarbonLoadModel} + +/** + * Repair logic for reindex command on maintable/indextable + */ +case class IndexRepairCommand(indexnameOp: Option[String], tableIdentifier: TableIdentifier, + dbName: String, + segments: Option[List[String]]) extends DataCommand { + + private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName) + + def processData(sparkSession: SparkSession): Seq[Row] = { +if (dbName == null) { + // dbName is null, repair for index table or all the index table in main table + val databaseName = if (tableIdentifier.database.isEmpty) { +SparkSession.getActiveSession.get.catalog.currentDatabase + } else { +tableIdentifier.database.get + } + triggerRepair(tableIdentifier.table, databaseName, indexnameOp, segments) +} else { + // repairing si for all index tables in the mentioned database in the repair command + sparkSession.sessionState.catalog.listTables(dbName).foreach { +tableIdent => + triggerRepair(tableIdent.table, dbName, indexnameOp, segments) + } +} +Seq.empty + } + + def triggerRepair(tableNameOp: String, databaseName: String, Review comment: change "tableNameOp" to "tableName" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3873: [CARBONDATA-3956] Reindex command on SI table
kunal642 commented on a change in pull request #3873: URL: https://github.com/apache/carbondata/pull/3873#discussion_r477342379 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/IndexRepairCommand.scala ## @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command.index + +import java.util + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql.{CarbonEnv, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.execution.command.DataCommand +import org.apache.spark.sql.hive.CarbonRelation +import org.apache.spark.sql.index.CarbonIndexUtil + +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.metadata.index.IndexType +import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, SegmentStatusManager} +import org.apache.carbondata.core.util.path.CarbonTablePath +import org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, CarbonLoadModel} + +/** + * Repair logic for reindex command on maintable/indextable + */ +case class IndexRepairCommand(indexnameOp: Option[String], tableIdentifier: TableIdentifier, + dbName: String, + segments: Option[List[String]]) extends DataCommand { + + private val LOGGER = LogServiceFactory.getLogService(this.getClass.getName) + + def processData(sparkSession: SparkSession): Seq[Row] = { +if (dbName == null) { + // dbName is null, repair for index table or all the index table in main table + val databaseName = if (tableIdentifier.database.isEmpty) { +SparkSession.getActiveSession.get.catalog.currentDatabase Review comment: please use the sparkSession which is passed as parameter to this method This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3900: [WIP] Test
CarbonDataQA1 commented on pull request #3900: URL: https://github.com/apache/carbondata/pull/3900#issuecomment-680907790 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3882/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kumarvishal09 commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
kumarvishal09 commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-680904506 @ajantha-bhat By default Row filter push down is false, In case of complex type for eg: array of string number of records can be much more than 32k, so filling in one shot can be a problem because creating a big array will be an overhead. + In case of direct fill we have ResuableDataBuffer to create one big byte array/column for one blocklet processing and reuse for all the page for a column inside, this wont be useful because number of element in case of complex type can vary, and byte array size will change every page and even creating a big byte array for one page when number of element in child is more can degrade query performance. So for complex type user must configure table page size to get the better query performance while creating the table. If number of row is page is higher we can avoid ResuableDataBuffer it may be become overhead. Some comments: 1. Pls refactor the code, avoid duplicate code if you can 2. Pls add comments. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3900: [WIP] Test
CarbonDataQA1 commented on pull request #3900: URL: https://github.com/apache/carbondata/pull/3900#issuecomment-680902015 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2141/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
kumarvishal09 commented on a change in pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#discussion_r477313518 ## File path: core/src/main/java/org/apache/carbondata/core/scan/result/BlockletScannedResult.java ## @@ -153,6 +154,9 @@ private ReusableDataBuffer[] measureReusableBuffer; + // index used by dimensionReusableBuffer + int dimensionReusableBufferIndex = 0; Review comment: private int dimensionReusableBufferIndex = 0; This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
kumarvishal09 commented on a change in pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#discussion_r477313518 ## File path: core/src/main/java/org/apache/carbondata/core/scan/result/BlockletScannedResult.java ## @@ -153,6 +154,9 @@ private ReusableDataBuffer[] measureReusableBuffer; + // index used by dimensionReusableBuffer + int dimensionReusableBufferIndex = 0; Review comment: change private int dimensionReusableBufferIndex; This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
kumarvishal09 commented on a change in pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#discussion_r477313220 ## File path: core/src/main/java/org/apache/carbondata/core/scan/executor/impl/AbstractQueryExecutor.java ## @@ -98,6 +98,9 @@ */ protected CarbonIterator queryIterator; + // Size of the ReusableDataBuffer based on the number of dimension projection columns + protected int reusableDimensionBufferSize = 0; Review comment: by default Int value is 0, remove it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3904: [CARBONDATA-3962]Remove unwanted empty fact directory in case of flat_folder table
CarbonDataQA1 commented on pull request #3904: URL: https://github.com/apache/carbondata/pull/3904#issuecomment-680889796 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2140/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3904: [CARBONDATA-3962]Remove unwanted empty fact directory in case of flat_folder table
CarbonDataQA1 commented on pull request #3904: URL: https://github.com/apache/carbondata/pull/3904#issuecomment-680889554 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3881/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
kumarvishal09 commented on a change in pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#discussion_r477308991 ## File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java ## @@ -376,6 +456,56 @@ private void fillVector(byte[] pageData, CarbonColumnVector vector, DataType vec DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter; decimalConverter.fillVector(pageData, pageSize, vectorInfo, nullBits, pageDataType); } + } else if (pageDataType == DataTypes.BYTE_ARRAY) { +if (vectorDataType == DataTypes.STRING || vectorDataType == DataTypes.BINARY +|| vectorDataType == DataTypes.VARCHAR) { + // for complex primitive string, binary, varchar type + int offset = 0; + for (int i = 0; i < pageSize; i++) { +byte[] stringLen = new byte[DataTypes.INT.getSizeInBytes()]; Review comment: this byte array is not required, wrap page data inside byte buffer and updates the position to get the length of the data By this we can avoid creating multiple bytebuffer and byte array . Pls handle all the places ``` ByteBuffer buffer = ByteBuffer.allocate(12); buffer.putInt(1); buffer.putInt(2); buffer.putInt(3); buffer.rewind(); buffer.position(4); int anInt = buffer.getInt(); ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
kumarvishal09 commented on a change in pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#discussion_r477308991 ## File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java ## @@ -376,6 +456,56 @@ private void fillVector(byte[] pageData, CarbonColumnVector vector, DataType vec DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter; decimalConverter.fillVector(pageData, pageSize, vectorInfo, nullBits, pageDataType); } + } else if (pageDataType == DataTypes.BYTE_ARRAY) { +if (vectorDataType == DataTypes.STRING || vectorDataType == DataTypes.BINARY +|| vectorDataType == DataTypes.VARCHAR) { + // for complex primitive string, binary, varchar type + int offset = 0; + for (int i = 0; i < pageSize; i++) { +byte[] stringLen = new byte[DataTypes.INT.getSizeInBytes()]; Review comment: this byte array is not required, wrap page data inside byte buffer and updates the position to get the length of the data By this we can avoid creating multiple bytebuffer and byte array . Pls handle all the places This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
kumarvishal09 commented on a change in pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#discussion_r477308991 ## File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java ## @@ -376,6 +456,56 @@ private void fillVector(byte[] pageData, CarbonColumnVector vector, DataType vec DecimalConverterFactory.DecimalConverter decimalConverter = vectorInfo.decimalConverter; decimalConverter.fillVector(pageData, pageSize, vectorInfo, nullBits, pageDataType); } + } else if (pageDataType == DataTypes.BYTE_ARRAY) { +if (vectorDataType == DataTypes.STRING || vectorDataType == DataTypes.BINARY +|| vectorDataType == DataTypes.VARCHAR) { + // for complex primitive string, binary, varchar type + int offset = 0; + for (int i = 0; i < pageSize; i++) { +byte[] stringLen = new byte[DataTypes.INT.getSizeInBytes()]; Review comment: this byte array is not required, wrap page data inside byte buffer and updates the position to get the length of the data By this we can avoid creating multiple bytebuffer and byte array This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
kumarvishal09 commented on a change in pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#discussion_r477292131 ## File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveCodec.java ## @@ -260,4 +265,65 @@ protected String debugInfo() { return this.toString(); } + public static VectorUtil checkAndUpdateToChildVector(ColumnVectorInfo vectorInfo, int pageSize, + CarbonColumnVector vector, DataType vectorDataType) { +VectorUtil vectorUtil = new VectorUtil(pageSize, vector, vectorDataType); +Stack vectorStack = vectorInfo.getVectorStack(); +// check and update to child vector info +if (vectorStack != null && vectorStack.peek() != null && vectorDataType.isComplexType()) { + if (DataTypes.isArrayType(vectorDataType)) { +List childElementsCountForEachRow = +((CarbonColumnVectorImpl) vector.getColumnVector()) +.getNumberOfChildrenElementsInEachRow(); +int newPageSize = 0; +for (int childElementsCount : childElementsCountForEachRow) { + newPageSize += childElementsCount; +} +vectorUtil.setPageSize(newPageSize); + } + // child vector flow, so fill the child vector + CarbonColumnVector childVector = vectorStack.pop(); + vectorUtil.setVector(childVector); + vectorUtil.setVectorDataType(childVector.getType()); +} +return vectorUtil; + } + + // Utility class to update current vector to child vector in case of complex type handling + public static class VectorUtil { +private int pageSize; +private CarbonColumnVector vector; +private DataType vectorDataType; + +private VectorUtil(int pageSize, CarbonColumnVector vector, DataType vectorDataType) { + this.pageSize = pageSize; + this.vector = vector; + this.vectorDataType = vectorDataType; +} + +public int getPageSize() { Review comment: we can use org.apache.commons.lang3.tuple class for returning 3 args from one method , pls check if you can use here This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
kumarvishal09 commented on a change in pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#discussion_r477289221 ## File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveCodec.java ## @@ -260,4 +265,65 @@ protected String debugInfo() { return this.toString(); } + public static VectorUtil checkAndUpdateToChildVector(ColumnVectorInfo vectorInfo, int pageSize, + CarbonColumnVector vector, DataType vectorDataType) { +VectorUtil vectorUtil = new VectorUtil(pageSize, vector, vectorDataType); +Stack vectorStack = vectorInfo.getVectorStack(); +// check and update to child vector info +if (vectorStack != null && vectorStack.peek() != null && vectorDataType.isComplexType()) { Review comment: use Objects.nonNull for not null case and for null use Objects.isnull This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kumarvishal09 commented on a change in pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
kumarvishal09 commented on a change in pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#discussion_r477289221 ## File path: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveCodec.java ## @@ -260,4 +265,65 @@ protected String debugInfo() { return this.toString(); } + public static VectorUtil checkAndUpdateToChildVector(ColumnVectorInfo vectorInfo, int pageSize, + CarbonColumnVector vector, DataType vectorDataType) { +VectorUtil vectorUtil = new VectorUtil(pageSize, vector, vectorDataType); +Stack vectorStack = vectorInfo.getVectorStack(); +// check and update to child vector info +if (vectorStack != null && vectorStack.peek() != null && vectorDataType.isComplexType()) { Review comment: use Objects.nonNull for not null case and for null use Objects.isnull, pls handle in all applicable places This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3875: [WIP]Presto write transactional
CarbonDataQA1 commented on pull request #3875: URL: https://github.com/apache/carbondata/pull/3875#issuecomment-680843631 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3880/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3875: [WIP]Presto write transactional
CarbonDataQA1 commented on pull request #3875: URL: https://github.com/apache/carbondata/pull/3875#issuecomment-680843243 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2139/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3903: [CARBONDATA-3963]Fix hive timestamp data mismatch issue and empty data during query issue
CarbonDataQA1 commented on pull request #3903: URL: https://github.com/apache/carbondata/pull/3903#issuecomment-680840597 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2138/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3903: [CARBONDATA-3963]Fix hive timestamp data mismatch issue and empty data during query issue
CarbonDataQA1 commented on pull request #3903: URL: https://github.com/apache/carbondata/pull/3903#issuecomment-680833851 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3879/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3963) timestamp data is wrong in case of reading carbon via hive and other issue
Akash R Nilugal created CARBONDATA-3963: --- Summary: timestamp data is wrong in case of reading carbon via hive and other issue Key: CARBONDATA-3963 URL: https://issues.apache.org/jira/browse/CARBONDATA-3963 Project: CarbonData Issue Type: Bug Reporter: Akash R Nilugal Assignee: Akash R Nilugal 1. timestamp data is wrong when carbon table is read via hive 2. carbon is not giving any data in beeline when queries via hive -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3962) Empty fact dirs are present in case of flat folder, which are unnecessary
Akash R Nilugal created CARBONDATA-3962: --- Summary: Empty fact dirs are present in case of flat folder, which are unnecessary Key: CARBONDATA-3962 URL: https://issues.apache.org/jira/browse/CARBONDATA-3962 Project: CarbonData Issue Type: Bug Reporter: Akash R Nilugal Assignee: Akash R Nilugal Empty fact dirs are present in case of flat folder, which are unnecessary -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] akashrn5 opened a new pull request #3904: []Remove unwanted empty fact directory in case of flat_folder table
akashrn5 opened a new pull request #3904: URL: https://github.com/apache/carbondata/pull/3904 ### Why is this PR needed? In case of flat folder, we write the data files directly at table path, so fact dir is not required. Fact dir is unwanted and present as empty dir. ### What changes were proposed in this PR? Remove empty fact dirs ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3900: [WIP] Test
CarbonDataQA1 commented on pull request #3900: URL: https://github.com/apache/carbondata/pull/3900#issuecomment-680828750 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2137/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3900: [WIP] Test
CarbonDataQA1 commented on pull request #3900: URL: https://github.com/apache/carbondata/pull/3900#issuecomment-680827562 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3878/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-680807986 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3876/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] asfgit closed pull request #3899: [HOTFIX] Aviod cleaning segments after reading tablestatus failed
asfgit closed pull request #3899: URL: https://github.com/apache/carbondata/pull/3899 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3899: [HOTFIX] Aviod cleaning segments after reading tablestatus failed
kunal642 commented on a change in pull request #3899: URL: https://github.com/apache/carbondata/pull/3899#discussion_r477201000 ## File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala ## @@ -310,7 +310,8 @@ class TestLoadDataGeneral extends QueryTest with BeforeAndAfterEach { val tableStatusFile = CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath) FileFactory.getCarbonFile(tableStatusFile).delete() sql("insert into stale values('k')") -checkAnswer(sql("select * from stale"), Row("k")) +// if table lose tablestatus file, the system should keep all data. +checkAnswer(sql("select * from stale"), Seq(Row("k"), Row("k"))) Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #3899: [HOTFIX] Aviod cleaning segments after reading tablestatus failed
kunal642 commented on pull request #3899: URL: https://github.com/apache/carbondata/pull/3899#issuecomment-680798546 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-680791844 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2135/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3899: [HOTFIX] Aviod cleaning segments after reading tablestatus failed
QiangCai commented on a change in pull request #3899: URL: https://github.com/apache/carbondata/pull/3899#discussion_r477191769 ## File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala ## @@ -310,7 +310,8 @@ class TestLoadDataGeneral extends QueryTest with BeforeAndAfterEach { val tableStatusFile = CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath) FileFactory.getCarbonFile(tableStatusFile).delete() sql("insert into stale values('k')") -checkAnswer(sql("select * from stale"), Row("k")) +// if table lose tablestatus file, the system should keep all data. +checkAnswer(sql("select * from stale"), Seq(Row("k"), Row("k"))) Review comment: for this test case, the result is two rows, the behavior is right. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
Indhumathi27 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-680785361 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3899: [HOTFIX] Aviod cleaning segments after reading tablestatus failed
QiangCai commented on a change in pull request #3899: URL: https://github.com/apache/carbondata/pull/3899#discussion_r477183265 ## File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala ## @@ -310,7 +310,8 @@ class TestLoadDataGeneral extends QueryTest with BeforeAndAfterEach { val tableStatusFile = CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath) FileFactory.getCarbonFile(tableStatusFile).delete() sql("insert into stale values('k')") -checkAnswer(sql("select * from stale"), Row("k")) +// if table lose tablestatus file, the system should keep all data. +checkAnswer(sql("select * from stale"), Seq(Row("k"), Row("k"))) Review comment: yes 1. now, after removing the tablestatus file, we can consider all segments are successful by default. 2. in the future, we can add a flag file into the segment when we mark the segment to be deleted. the flag file contains a message the segment is mark_for_delete. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
akashrn5 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-680780532 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3899: [HOTFIX] Aviod cleaning segments after reading tablestatus failed
kunal642 commented on a change in pull request #3899: URL: https://github.com/apache/carbondata/pull/3899#discussion_r477173788 ## File path: integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala ## @@ -310,7 +310,8 @@ class TestLoadDataGeneral extends QueryTest with BeforeAndAfterEach { val tableStatusFile = CarbonTablePath.getTableStatusFilePath(carbonTable.getTablePath) FileFactory.getCarbonFile(tableStatusFile).delete() sql("insert into stale values('k')") -checkAnswer(sql("select * from stale"), Row("k")) +// if table lose tablestatus file, the system should keep all data. +checkAnswer(sql("select * from stale"), Seq(Row("k"), Row("k"))) Review comment: I think this behaviour would be harrmful when the segment is marked for delete..If the segment is not removed then the results would be incorrect. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope
CarbonDataQA1 commented on pull request #3901: URL: https://github.com/apache/carbondata/pull/3901#issuecomment-680773026 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2134/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope
CarbonDataQA1 commented on pull request #3901: URL: https://github.com/apache/carbondata/pull/3901#issuecomment-680771095 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3875/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Zhangshunyu commented on pull request #3899: [HOTFIX] Aviod cleaning segments after reading tablestatus failed
Zhangshunyu commented on pull request #3899: URL: https://github.com/apache/carbondata/pull/3899#issuecomment-680768668 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 opened a new pull request #3903: [wip]Fix hive timestamp data mismatch issue and empty data during query issue
akashrn5 opened a new pull request #3903: URL: https://github.com/apache/carbondata/pull/3903 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3900: [WIP] Test
CarbonDataQA1 commented on pull request #3900: URL: https://github.com/apache/carbondata/pull/3900#issuecomment-680760581 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2133/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3900: [WIP] Test
CarbonDataQA1 commented on pull request #3900: URL: https://github.com/apache/carbondata/pull/3900#issuecomment-680758465 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3874/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
CarbonDataQA1 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-680755507 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3873/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
CarbonDataQA1 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-680755157 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2132/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3899: [HOTFIX] Aviod cleaning segments after reading tablestatus failed
CarbonDataQA1 commented on pull request #3899: URL: https://github.com/apache/carbondata/pull/3899#issuecomment-680755166 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2131/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3899: [HOTFIX] Aviod cleaning segments after reading tablestatus failed
CarbonDataQA1 commented on pull request #3899: URL: https://github.com/apache/carbondata/pull/3899#issuecomment-680754700 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3872/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 opened a new pull request #3902: [WIP][CARBONDATA-3961] reorder filter expression based on storage ordinal
kunal642 opened a new pull request #3902: URL: https://github.com/apache/carbondata/pull/3902 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3961) Reorder filter according to the column storage ordinal to improve reading
Kunal Kapoor created CARBONDATA-3961: Summary: Reorder filter according to the column storage ordinal to improve reading Key: CARBONDATA-3961 URL: https://issues.apache.org/jira/browse/CARBONDATA-3961 Project: CarbonData Issue Type: Bug Reporter: Kunal Kapoor Assignee: Kunal Kapoor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3856: [CARBONDATA-3929]Improve CDC performance
ajantha-bhat commented on a change in pull request #3856: URL: https://github.com/apache/carbondata/pull/3856#discussion_r477095488 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/merge/CarbonMergeDataSetCommand.scala ## @@ -194,29 +210,32 @@ case class CarbonMergeDataSetCommand( tuple._2.asJava) } } - Some(UpdateTableModel(true, trxMgr.getLatestTrx, -executorErrors, tuple._2, true)) + Some(UpdateTableModel(isUpdate = true, trxMgr.getLatestTrx, +executorErrors, tuple._2, loadAsNewSegment = true)) } else { None } -CarbonInsertIntoWithDf( - databaseNameOp = Some(carbonTable.getDatabaseName), +val dataFrame = loadDF.select(tableCols.map(col): _*) +CarbonInsertIntoCommand(databaseNameOp = Some(carbonTable.getDatabaseName), tableName = carbonTable.getTableName, - options = Map("fileheader" -> header, "sort_scope" -> "nosort"), + options = Map("fileheader" -> header, "sort_scope" -> "no_sort"), Review comment: check this. https://github.com/apache/carbondata/pull/3901 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope
ajantha-bhat commented on pull request #3901: URL: https://github.com/apache/carbondata/pull/3901#issuecomment-680714175 @QiangCai , @ravipesala @marchpure @akashrn5 : please check this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3856: [CARBONDATA-3929]Improve CDC performance
ajantha-bhat commented on a change in pull request #3856: URL: https://github.com/apache/carbondata/pull/3856#discussion_r477095415 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/merge/CarbonMergeDataSetCommand.scala ## @@ -194,29 +210,32 @@ case class CarbonMergeDataSetCommand( tuple._2.asJava) } } - Some(UpdateTableModel(true, trxMgr.getLatestTrx, -executorErrors, tuple._2, true)) + Some(UpdateTableModel(isUpdate = true, trxMgr.getLatestTrx, +executorErrors, tuple._2, loadAsNewSegment = true)) } else { None } -CarbonInsertIntoWithDf( - databaseNameOp = Some(carbonTable.getDatabaseName), +val dataFrame = loadDF.select(tableCols.map(col): _*) +CarbonInsertIntoCommand(databaseNameOp = Some(carbonTable.getDatabaseName), tableName = carbonTable.getTableName, - options = Map("fileheader" -> header, "sort_scope" -> "nosort"), + options = Map("fileheader" -> header, "sort_scope" -> "no_sort"), Review comment: @akashrn5 : instead of fixing no_sort, would have analyzed why it was added. Originally it was using Target table sort scope only. #3764 added this. would have removed it here. Now that you changed to no_sort, target table sort_scope is not used. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat opened a new pull request #3901: [CARBONDATA-3820] CDC update as new segment should use target table sort_scope
ajantha-bhat opened a new pull request #3901: URL: https://github.com/apache/carbondata/pull/3901 ### Why is this PR needed? #3764 has added nosort (this is wrong code, but no functional impact as it was not changing new segment load to no_sort) #3856 has changed it to no_sort (creates a functional impact by changing target table new segment to use to no_sort) ### What changes were proposed in this PR? CDC update as new segment should use target table sort_scope ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No (verified manually the flows) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] dependabot[bot] commented on pull request #3447: Bump dep.jackson.version from 2.6.5 to 2.10.1 in /store/sdk
dependabot[bot] commented on pull request #3447: URL: https://github.com/apache/carbondata/pull/3447#issuecomment-680710899 Dependabot tried to update this pull request, but something went wrong. We're looking into it, but in the meantime you can retry the update by commenting `@dependabot rebase`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] dependabot[bot] commented on pull request #3456: Bump solr.version from 6.3.0 to 8.3.0 in /datamap/lucene
dependabot[bot] commented on pull request #3456: URL: https://github.com/apache/carbondata/pull/3456#issuecomment-680710764 Dependabot tried to update this pull request, but something went wrong. We're looking into it, but in the meantime you can retry the update by commenting `@dependabot rebase`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] asfgit closed pull request #3856: [CARBONDATA-3929]Improve CDC performance
asfgit closed pull request #3856: URL: https://github.com/apache/carbondata/pull/3856 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-680702383 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2130/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on pull request #3899: [WIP] Aviod cleaning segments after reading tablestatus failed
QiangCai commented on pull request #3899: URL: https://github.com/apache/carbondata/pull/3899#issuecomment-680701522 @kunal642 please check this modification. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
ShreelekhyaG commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r477080590 ## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ## @@ -444,9 +446,38 @@ private static Object parseTimestamp(String dimensionValue, String dateFormat) { dateFormatter = timestampFormatter.get(); } dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + timeValue = dateToStr.getTime(); + validateTimeStampRange(timeValue); + return timeValue; } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { +try { + dateFormatter.setLenient(true); + dateToStr = dateFormatter.parse(dimensionValue); + dateFormatter.setLenient(false); + timeValue = dateToStr.getTime(); + validateTimeStampRange(timeValue); + LOGGER.info("Parsed data with lenience as true, setting back to default mode"); + return timeValue; +} catch (ParseException ex) { + dateFormatter.setLenient(false); Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
ShreelekhyaG commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r477080281 ## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ## @@ -444,9 +446,38 @@ private static Object parseTimestamp(String dimensionValue, String dateFormat) { dateFormatter = timestampFormatter.get(); } dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + timeValue = dateToStr.getTime(); + validateTimeStampRange(timeValue); + return timeValue; } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { +try { + dateFormatter.setLenient(true); Review comment: added an example. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org