[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3890: [CARBONDATA-3952] After reset query not hitting MV
Indhumathi27 commented on a change in pull request #3890: URL: https://github.com/apache/carbondata/pull/3890#discussion_r476187983 ## File path: integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/SetParameterTestCase.scala ## @@ -252,6 +252,27 @@ class SetParameterTestCase extends QueryTest with BeforeAndAfterAll { sql("RESET") } + test("TC_014-test mv after reset properties") { +sql("drop table if exists maintable") +sql("drop MATERIALIZED VIEW if exists mv1") +sql("CREATE TABLE maintable(empno int,empname string,projectcode int, projectjoindate " + +"Timestamp, projectenddate date,salary double) STORED AS carbondata") +sql("CREATE MATERIALIZED VIEW mv1 as select timeseries(projectenddate,'day'), sum" + +"(projectcode) from maintable group by timeseries(projectenddate,'day')") +sql("insert into maintable select 1000,'PURUJIT',00012,'2015-07-26 12:07:28','2016-05-20'," + +"15000.00") +sql("insert into maintable select 1001,'PANKAJ',00010,'2015-07-26 17:32:20','2016-05-20'," + +"25000.00") +sql("set carbon.input.segments.defualt.maintable=1") +checkExistence(sql("EXPLAIN select timeseries(projectenddate,'day'), sum(projectcode) from " + Review comment: Can use 'TestUtil.verifyMVHit' method to verify if query hits mv or not This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3890: [CARBONDATA-3952] After reset query not hitting MV
Indhumathi27 commented on a change in pull request #3890: URL: https://github.com/apache/carbondata/pull/3890#discussion_r476188040 ## File path: integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/SetParameterTestCase.scala ## @@ -252,6 +252,27 @@ class SetParameterTestCase extends QueryTest with BeforeAndAfterAll { sql("RESET") } + test("TC_014-test mv after reset properties") { +sql("drop table if exists maintable") +sql("drop MATERIALIZED VIEW if exists mv1") +sql("CREATE TABLE maintable(empno int,empname string,projectcode int, projectjoindate " + +"Timestamp, projectenddate date,salary double) STORED AS carbondata") +sql("CREATE MATERIALIZED VIEW mv1 as select timeseries(projectenddate,'day'), sum" + +"(projectcode) from maintable group by timeseries(projectenddate,'day')") +sql("insert into maintable select 1000,'PURUJIT',00012,'2015-07-26 12:07:28','2016-05-20'," + +"15000.00") +sql("insert into maintable select 1001,'PANKAJ',00010,'2015-07-26 17:32:20','2016-05-20'," + +"25000.00") +sql("set carbon.input.segments.defualt.maintable=1") +checkExistence(sql("EXPLAIN select timeseries(projectenddate,'day'), sum(projectcode) from " + + "maintable group by timeseries(projectenddate,'day')"), true, "mv1") +sql("reset") +checkExistence(sql("EXPLAIN select timeseries(projectenddate,'day'), sum(projectcode) from " + Review comment: same as above comment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
Indhumathi27 commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r476186302 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala ## @@ -306,7 +315,107 @@ class TestLoadDataWithDiffTimestampFormat extends QueryTest with BeforeAndAfterA } } + test("test load, update data with setlenient carbon property for daylight " + + "saving time from different timezone") { +CarbonProperties.getInstance().addProperty( + CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE, "true") +val defaultTimeZone = TimeZone.getDefault +TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai")) +sql("DROP TABLE IF EXISTS test_time") +sql( + """ + CREATE TABLE IF NOT EXISTS test_time + (ID Int, date Date, time Timestamp) + STORED AS carbondata TBLPROPERTIES('dateformat'='-MM-dd', + 'timestampformat'='-MM-dd HH:mm') +""") +sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time") +sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ") +sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'") +checkAnswer( + sql("SELECT time FROM test_time WHERE ID = 1"), + Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"))) +) +checkAnswer( + sql("SELECT time FROM test_time WHERE ID = 11"), + Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"))) +) +checkAnswer( + sql("SELECT time FROM test_time WHERE ID = 2"), + Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00"))) +) +sql("DROP TABLE test_time") +TimeZone.setDefault(defaultTimeZone) Review comment: Can move this to afterAll, as if testcase fails, TimeZone will be left Unset to default This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
Indhumathi27 commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r476184514 ## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ## @@ -435,18 +436,48 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue, private static Object parseTimestamp(String dimensionValue, String dateFormat) { Date dateToStr; -DateFormat dateFormatter; +DateFormat dateFormatter = null; +long timeValue; try { if (null != dateFormat && !dateFormat.trim().isEmpty()) { dateFormatter = new SimpleDateFormat(dateFormat); -dateFormatter.setLenient(false); } else { dateFormatter = timestampFormatter.get(); } + dateFormatter.setLenient(false); dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + timeValue = dateToStr.getTime(); + validateTimeStampRange(timeValue); + return timeValue; } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { +try { + LOGGER.info("Changing setLenient to true"); + dateFormatter.setLenient(true); + dateToStr = dateFormatter.parse(dimensionValue); + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + timeValue = dateToStr.getTime(); + validateTimeStampRange(timeValue); + return timeValue; +} catch (ParseException ex) { + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + throw new NumberFormatException(ex.getMessage()); +} + } else { +throw new NumberFormatException(e.getMessage()); + } +} + } + + private static void validateTimeStampRange(Long timeValue) { +if (timeValue < DateDirectDictionaryGenerator.MIN_VALUE +|| timeValue > DateDirectDictionaryGenerator.MAX_VALUE) { + throw new NumberFormatException("timestamp column data is not in valid range of: " Review comment: Can print timeValue also in the log This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
Indhumathi27 commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r476183853 ## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ## @@ -435,18 +436,48 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue, private static Object parseTimestamp(String dimensionValue, String dateFormat) { Date dateToStr; -DateFormat dateFormatter; +DateFormat dateFormatter = null; +long timeValue; try { if (null != dateFormat && !dateFormat.trim().isEmpty()) { dateFormatter = new SimpleDateFormat(dateFormat); -dateFormatter.setLenient(false); Review comment: I think you dont have to move this line down. as setLenient for timestampFormatter variable is handled during initialisation This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan-c980 commented on pull request #3876: TestingCI
Karan-c980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679683246 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
Indhumathi27 commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r476183063 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -1592,6 +1592,13 @@ private CarbonCommonConstants() { public static final String CARBON_LUCENE_INDEX_STOP_WORDS_DEFAULT = "false"; + // Property to enable parsing the timestamp/date data with setLenient = true in load + // flow if it fails with parse invalid timestamp data. + @CarbonProperty(dynamicConfigurable = true) + public static final String CARBON_LOAD_SETLENIENT_ENABLE = "carbon.load.setlenient.enable"; Review comment: ```suggestion public static final String CARBON_LOAD_TIMESTAMP_SETLENIENT_ENABLE = "carbon.load.setlenient.enable"; ``` as this property is specific to timestamp This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] dependabot[bot] commented on pull request #3456: Bump solr.version from 6.3.0 to 8.3.0 in /datamap/lucene
dependabot[bot] commented on pull request #3456: URL: https://github.com/apache/carbondata/pull/3456#issuecomment-679464079 Dependabot tried to update this pull request, but something went wrong. We're looking into it, but in the meantime you can retry the update by commenting `@dependabot rebase`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] dependabot[bot] commented on pull request #3447: Bump dep.jackson.version from 2.6.5 to 2.10.1 in /store/sdk
dependabot[bot] commented on pull request #3447: URL: https://github.com/apache/carbondata/pull/3447#issuecomment-679464062 Dependabot tried to update this pull request, but something went wrong. We're looking into it, but in the meantime you can retry the update by commenting `@dependabot rebase`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] asfgit closed pull request #3891: [CARBONDATA-3889] Enable scala check style
asfgit closed pull request #3891: URL: https://github.com/apache/carbondata/pull/3891 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] xubo245 commented on pull request #3891: [CARBONDATA-3889] Enable scala check style
xubo245 commented on pull request #3891: URL: https://github.com/apache/carbondata/pull/3891#issuecomment-679459877 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] xubo245 removed a comment on pull request #3891: [CARBONDATA-3889] Enable scala check style
xubo245 removed a comment on pull request #3891: URL: https://github.com/apache/carbondata/pull/3891#issuecomment-678023289 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on pull request #3897: [CARBONDATA-3958] Fix CDC merge tasks can't finish issue
QiangCai commented on pull request #3897: URL: https://github.com/apache/carbondata/pull/3897#issuecomment-679445110 @ravipesala @akashrn5 we can merge it also. In the future, other cases can use CarbonTableOutputFormat and this issue will not happen. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679357060 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3853/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679350328 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2112/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-3915) Correction in the documentation for spark-shell
[ https://issues.apache.org/jira/browse/CARBONDATA-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-3915. -- Fix Version/s: 2.1.0 Resolution: Fixed > Correction in the documentation for spark-shell > --- > > Key: CARBONDATA-3915 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3915 > Project: CarbonData > Issue Type: Bug > Components: hive-integration >Affects Versions: 2.0.0 > Environment: 3 node ANT cluster one track. >Reporter: Prasanna Ravichandran >Priority: Minor > Fix For: 2.1.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Spark-Shell program is not working, which is given in the > [https://github.com/apache/carbondata/blob/master/docs/hive-guide.md] > > Working program is given as below: > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.CarbonSession._ > val newSpark = > SparkSession.builder().config(sc.getConf).enableHiveSupport.config("spark.sql.extensions","org.apache.spark.sql.CarbonExtensions").getOrCreate() > newSpark.sql("drop table if exists hive_carbon").show > newSpark.sql("create table hive_carbon(id int, name string, scale decimal, > country string, salary double) STORED AS carbondata").show > newSpark.sql("LOAD DATA INPATH > 'hdfs://hacluster/user/prasanna/samplehive.csv' INTO TABLE hive_carbon").show > newSpark.sql("SELECT * FROM hive_carbon").show() > so could update the above working program in the > [https://github.com/apache/carbondata/blob/master/docs/hive-guide.md] page, > under the "Start Spark shell by running the following command in the Spark > directory" section. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #3866: [CARBONDATA-3915] Correction in the documentation for spark-shell
asfgit closed pull request #3866: URL: https://github.com/apache/carbondata/pull/3866 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #3866: [CARBONDATA-3915] Correction in the documentation for spark-shell
kunal642 commented on pull request #3866: URL: https://github.com/apache/carbondata/pull/3866#issuecomment-679235346 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679216716 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3852/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
akashrn5 commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r475716479 ## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ## @@ -435,18 +436,48 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue, private static Object parseTimestamp(String dimensionValue, String dateFormat) { Date dateToStr; -DateFormat dateFormatter; +DateFormat dateFormatter = null; +long timeValue; try { if (null != dateFormat && !dateFormat.trim().isEmpty()) { dateFormatter = new SimpleDateFormat(dateFormat); -dateFormatter.setLenient(false); } else { dateFormatter = timestampFormatter.get(); } + dateFormatter.setLenient(false); dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + timeValue = dateToStr.getTime(); + validateTimeStampRange(timeValue); + return timeValue; } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { +try { + LOGGER.info("Changing setLenient to true"); + dateFormatter.setLenient(true); + dateToStr = dateFormatter.parse(dimensionValue); + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); Review comment: i think these logs not required here, may be you can add only one log say after line 462, saying you had set true as parsing failed for invalid data and parsing finished, something meaningful like this and in line 466, you can say `"failed to parse data with lenience as true, setting back to default"` like this , it will look clean and good ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/badrecordloger/BadRecordActionTest.scala ## @@ -270,6 +274,53 @@ class BadRecordActionTest extends QueryTest { } } + test("test bad record FAIL with invalid timestamp range") { +val csvPath = s"$resourcesPath/badrecords/invalidTimeStampRange.csv" +val rows1 = new ListBuffer[Array[String]] +rows1 += Array("ID", "date", "time") +rows1 += Array("1", "2016-7-24", "342016-7-24 01:02:30") +createCSV(rows1, csvPath) +sql("DROP TABLE IF EXISTS test_time") +sql(""" + CREATE TABLE IF NOT EXISTS test_time + (ID Int, date Date, time Timestamp) + STORED AS carbondata TBLPROPERTIES('dateformat'='-MM-dd', + 'timestampformat'='-MM-dd HH:mm:ss') +""") +val exception = intercept[Exception] { + sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/badrecords/invalidTimeStampRange.csv' " + + s"into table test_time options ('bad_records_action'='fail')") +} +assert(exception.getMessage + .contains( +"Data load failed due to bad record: The value with column name time and column data" + +" type TIMESTAMP is not a valid TIMESTAMP type.Please enable bad record logger to know" + +" the detail reason")) +sql("DROP TABLE IF EXISTS test_time") +deleteCSVFile(csvPath) + } + + def createCSV(rows: ListBuffer[Array[String]], csvPath: String): Unit = { +val out = new BufferedWriter(new FileWriter(csvPath)) +val writer: CSVWriter = new CSVWriter(out) + +for (row <- rows) { + writer.writeNext(row) +} +out.close() +writer.close() + } + + def deleteCSVFile(csvPath: String): Unit = { Review comment: no need to create new method only for delete, call the file delete API directly in test case as not used in many test case. ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala ## @@ -306,7 +315,107 @@ class TestLoadDataWithDiffTimestampFormat extends QueryTest with BeforeAndAfterA } } + test("test load, update data with setlenient carbon property for daylight " + + "saving time from different timezone") { +CarbonProperties.getInstance().addProperty( + CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE, "true") +val defaultTimeZone = TimeZone.getDefault +TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai")) +sql("DROP TABLE IF EXISTS test_time") +sql( + """ + CREATE TABLE IF NOT EXISTS test_time + (ID Int, date Date, time Timestamp) + STORED AS carbondata TBLPROPERTIES('dateformat'='-MM-dd', + 'timestampformat'='-MM-dd HH:mm') +""") +sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' into table test_time") +sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ") +sql("update test_time set (time) =
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679210475 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2111/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
akashrn5 commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r475346485 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -1592,6 +1592,13 @@ private CarbonCommonConstants() { public static final String CARBON_LUCENE_INDEX_STOP_WORDS_DEFAULT = "false"; + // Property to enable parsing the timestamp/date data with setLenient = true in load + // flow if it fails with parse invalid timestamp data. Review comment: can give the timestamp example and mention DST, so that we will know in future why this property was added. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3897: [CARBONDATA-3958] Fix CDC merge tasks can't finish issue
CarbonDataQA1 commented on pull request #3897: URL: https://github.com/apache/carbondata/pull/3897#issuecomment-679159418 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3851/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3897: [CARBONDATA-3958] Fix CDC merge tasks can't finish issue
CarbonDataQA1 commented on pull request #3897: URL: https://github.com/apache/carbondata/pull/3897#issuecomment-679159237 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2110/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan-c980 commented on pull request #3876: TestingCI
Karan-c980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679141387 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679136250 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2109/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679135243 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3850/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #3897: [CARBONDATA-3958] Fix CDC merge tasks can't finish issue
akashrn5 commented on pull request #3897: URL: https://github.com/apache/carbondata/pull/3897#issuecomment-679123761 @QiangCai as @ravipesala said since we are not using this, may be once after #3856 is merged, we can check and close this i think. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on pull request #3856: [CARBONDATA-3929]Improve CDC performance
QiangCai commented on pull request #3856: URL: https://github.com/apache/carbondata/pull/3856#issuecomment-679105065 why don't use row_v1 format in the streaming module? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ravipesala commented on pull request #3897: [CARBONDATA-3958] Fix CDC merge tasks can't finish issue
ravipesala commented on pull request #3897: URL: https://github.com/apache/carbondata/pull/3897#issuecomment-679098812 LGTM, Anyway we are not using it in CDC as per the PR https://github.com/apache/carbondata/pull/3856. This case not supposed to happen, it means source is not writing any data for more than 5 minutes. It should be some exceptional scenarios. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on pull request #3897: [CARBONDATA-3958] Fix CDC merge tasks can't finish issue
QiangCai commented on pull request #3897: URL: https://github.com/apache/carbondata/pull/3897#issuecomment-679096749 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ravipesala commented on a change in pull request #3856: [CARBONDATA-3929]Improve CDC performance
ravipesala commented on a change in pull request #3856: URL: https://github.com/apache/carbondata/pull/3856#discussion_r475560861 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/merge/CarbonMergeDataSetCommand.scala ## @@ -194,29 +210,32 @@ case class CarbonMergeDataSetCommand( tuple._2.asJava) } } - Some(UpdateTableModel(true, trxMgr.getLatestTrx, -executorErrors, tuple._2, true)) + Some(UpdateTableModel(isUpdate = true, trxMgr.getLatestTrx, +executorErrors, tuple._2, loadAsNewSegment = true)) } else { None } -CarbonInsertIntoWithDf( - databaseNameOp = Some(carbonTable.getDatabaseName), +val dataFrame = loadDF.select(tableCols.map(col): _*) +CarbonInsertIntoCommand(databaseNameOp = Some(carbonTable.getDatabaseName), tableName = carbonTable.getTableName, - options = Map("fileheader" -> header, "sort_scope" -> "nosort"), + options = Map("fileheader" -> header, "sort_scope" -> "no_sort"), Review comment: I agree with @ajantha-bhat we should use the target table sort scope. But this PR is only about performance we can cover this change in another PR with proper test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ravipesala commented on pull request #3856: [CARBONDATA-3929]Improve CDC performance
ravipesala commented on pull request #3856: URL: https://github.com/apache/carbondata/pull/3856#issuecomment-679095323 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI
Karan980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679080452 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679079380 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2108/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679078760 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3849/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3894: [CARBONDATA-3957] Added property to enable disable the repair logic and provide a limit to number of segments in SILoadEvent
kunal642 commented on a change in pull request #3894: URL: https://github.com/apache/carbondata/pull/3894#discussion_r475533764 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -2262,6 +2262,32 @@ private CarbonCommonConstants() { public static final int CARBON_INDEX_SERVER_WORKER_THREADS_DEFAULT = 500; + /** + * Configured property to enable/disable load failed segments in SI table during + * load/insert command. + */ + @CarbonProperty(dynamicConfigurable = true) + public static final String CARBON_LOAD_SI_REPAIR = "carbon.load.si.repair"; + + /** + * Default value for load failed segments in SI table during + * load/insert command. + */ + public static final String CARBON_LOAD_SI_REPAIR_DEFAULT = "true"; + + /** + * Property to give a limit to the number of segments that are reloaded in the + * SI table in the FailedSegments listener. + */ + @CarbonProperty + public static final String CARBON_SI_REPAIR_LIMIT = "carbon.si.repair.limit"; + + /** + * Default value for the number of segments to be repaired during SI repair. + */ + public static final String CARBON_SI_REPAIR_LIMIT_DEFAULT = "1"; Review comment: No need for a default value, if the user has not set then we can go with the existing logic of repairing all the segments This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3894: [CARBONDATA-3957] Added property to enable disable the repair logic and provide a limit to number of segments in SILoadEvent
kunal642 commented on a change in pull request #3894: URL: https://github.com/apache/carbondata/pull/3894#discussion_r475533961 ## File path: core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java ## @@ -2076,4 +2076,16 @@ public static boolean isAuditEnabled() { public static void setAuditEnabled(boolean enabled) { getInstance().addProperty(CarbonCommonConstants.CARBON_ENABLE_AUDIT, String.valueOf(enabled)); } + + public boolean isSIRepairEnabled() { +String configuredValue = getProperty(CarbonCommonConstants.CARBON_LOAD_SI_REPAIR, +CarbonCommonConstants.CARBON_LOAD_SI_REPAIR_DEFAULT); +return Boolean.parseBoolean(configuredValue); + } + + public int maxSIRepairLimit() { Review comment: rename to "getMaxSIRepairLimit" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3894: [CARBONDATA-3957] Added property to enable disable the repair logic and provide a limit to number of segments in SILoadEvent
kunal642 commented on a change in pull request #3894: URL: https://github.com/apache/carbondata/pull/3894#discussion_r475533562 ## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ## @@ -2262,6 +2262,32 @@ private CarbonCommonConstants() { public static final int CARBON_INDEX_SERVER_WORKER_THREADS_DEFAULT = 500; + /** + * Configured property to enable/disable load failed segments in SI table during + * load/insert command. + */ + @CarbonProperty(dynamicConfigurable = true) + public static final String CARBON_LOAD_SI_REPAIR = "carbon.load.si.repair"; + + /** + * Default value for load failed segments in SI table during + * load/insert command. + */ + public static final String CARBON_LOAD_SI_REPAIR_DEFAULT = "true"; + + /** + * Property to give a limit to the number of segments that are reloaded in the + * SI table in the FailedSegments listener. + */ + @CarbonProperty + public static final String CARBON_SI_REPAIR_LIMIT = "carbon.si.repair.limit"; Review comment: I think its better to have these 2 properties at table level instead of global. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3866: [CARBONDATA-3915] Correction in the documentation for spark-shell
CarbonDataQA1 commented on pull request #3866: URL: https://github.com/apache/carbondata/pull/3866#issuecomment-679070825 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2107/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-3933) insert, desc throws error when the column name contains special character
[ https://issues.apache.org/jira/browse/CARBONDATA-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-3933. -- Fix Version/s: 2.1.0 Resolution: Fixed > insert, desc throws error when the column name contains special character > - > > Key: CARBONDATA-3933 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3933 > Project: CarbonData > Issue Type: Bug >Reporter: Akash R Nilugal >Priority: Minor > Fix For: 2.1.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > insert, desc throws error when the column name contains special character > sql("drop table if exists special_char") > sql("create table special_char(`i#d` string, `nam(e`
[GitHub] [carbondata] asfgit closed pull request #3862: [CARBONDATA-3933]Fix DDL/DML failures after table is created with column names having special characters like #,\,%
asfgit closed pull request #3862: URL: https://github.com/apache/carbondata/pull/3862 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3897: [CARBONDATA-3958] Fix CDC merge tasks can't finish issue
CarbonDataQA1 commented on pull request #3897: URL: https://github.com/apache/carbondata/pull/3897#issuecomment-679068984 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3847/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3897: [CARBONDATA-3958] Fix CDC merge tasks can't finish issue
CarbonDataQA1 commented on pull request #3897: URL: https://github.com/apache/carbondata/pull/3897#issuecomment-679068472 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2106/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3866: [CARBONDATA-3915] Correction in the documentation for spark-shell
CarbonDataQA1 commented on pull request #3866: URL: https://github.com/apache/carbondata/pull/3866#issuecomment-679067484 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3848/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #3862: [CARBONDATA-3933]Fix DDL/DML failures after table is created with column names having special characters like #,\,%
kunal642 commented on pull request #3862: URL: https://github.com/apache/carbondata/pull/3862#issuecomment-679065415 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
VenuReddy2103 commented on pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#issuecomment-679040734 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
VenuReddy2103 commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r475494371 ## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ## @@ -435,19 +436,48 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue, private static Object parseTimestamp(String dimensionValue, String dateFormat) { Date dateToStr; -DateFormat dateFormatter; +DateFormat dateFormatter = null; try { if (null != dateFormat && !dateFormat.trim().isEmpty()) { dateFormatter = new SimpleDateFormat(dateFormat); -dateFormatter.setLenient(false); } else { dateFormatter = timestampFormatter.get(); } + dateFormatter.setLenient(false); dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + return validateTimeStampRange(dateToStr.getTime()); } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { +try { + LOGGER.info("Changing setLenient to true for TimeStamp: " + dimensionValue); + dateFormatter.setLenient(true); + dateToStr = dateFormatter.parse(dimensionValue); + LOGGER.info("Changing " + dimensionValue + " to " + dateToStr); + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + return validateTimeStampRange(dateToStr.getTime()); +} catch (ParseException ex) { + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + throw new NumberFormatException(ex.getMessage()); +} + } else { +throw new NumberFormatException(e.getMessage()); + } +} + } + + private static Long validateTimeStampRange(Long timeValue) { +long minValue = DateDirectDictionaryGenerator.MIN_VALUE; +long maxValue = DateDirectDictionaryGenerator.MAX_VALUE; +if (timeValue < minValue || timeValue > maxValue) { + if (LOGGER.isDebugEnabled()) { +LOGGER.debug("Value for timestamp type column is not in valid range."); + } + throw new NumberFormatException("Value for timestamp type column is not in valid range."); Review comment: I think, number format excepiton is ok. Also our exception message shows the correct error string that it is not in the valid range. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI
Karan980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679027809 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679025574 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2105/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679024682 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3846/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akkio-97 commented on pull request #3866: [CARBONDATA-3915] Correction in the documentation for spark-shell
akkio-97 commented on pull request #3866: URL: https://github.com/apache/carbondata/pull/3866#issuecomment-679015969 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai opened a new pull request #3897: [CARBONDATA-3958] Fix CDC merge tasks can't finish issue
QiangCai opened a new pull request #3897: URL: https://github.com/apache/carbondata/pull/3897 ### Why is this PR needed? CDC merge tasks are blocked when data loading use CarbonTableOutputFormat.getRecordWriter method. Because the poll method of the queue is time out in some cases. ### What changes were proposed in this PR? If the output is not closed, it will poll a row batch in the loops till it gets a not null batch. If the output is closed, it will break the loop. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 commented on pull request #3893: [CARBONDATA-3959] Added new property to set the value of executor LRU cache size to 70% of the total executor memory in IndexServer, if
Karan980 commented on pull request #3893: URL: https://github.com/apache/carbondata/pull/3893#issuecomment-679002255 @Indhumathi27 Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3959) In indexserver set executor lRU cache size to 70% of total executor memory
Karan created CARBONDATA-3959: - Summary: In indexserver set executor lRU cache size to 70% of total executor memory Key: CARBONDATA-3959 URL: https://issues.apache.org/jira/browse/CARBONDATA-3959 Project: CarbonData Issue Type: Improvement Components: core, spark-integration Affects Versions: 2.0.0 Reporter: Karan Added new property to set the value of executor LRU cache size to 70% of the total executor memory in IndexServer, if executor LRU cache size is not configured. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3958) CDC Merge task can't finish
David Cai created CARBONDATA-3958: - Summary: CDC Merge task can't finish Key: CARBONDATA-3958 URL: https://issues.apache.org/jira/browse/CARBONDATA-3958 Project: CarbonData Issue Type: Bug Reporter: David Cai # The merge tasks take a long time and can't finish in some cases. # We find warning "This scenario should not happen" in log -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3866: [CARBONDATA-3915] Correction in the documentation for spark-shell
CarbonDataQA1 commented on pull request #3866: URL: https://github.com/apache/carbondata/pull/3866#issuecomment-678976496 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3845/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3862: [CARBONDATA-3933]Fix DDL/DML failures after table is created with column names having special characters like #,\,%
QiangCai commented on a change in pull request #3862: URL: https://github.com/apache/carbondata/pull/3862#discussion_r475413812 ## File path: integration/spark/src/test/scala/org/apache/spark/carbondata/restructure/AlterTableRevertTestCase.scala ## @@ -30,6 +32,13 @@ import org.apache.carbondata.spark.exception.ProcessMetaDataException class AlterTableRevertTestCase extends QueryTest with BeforeAndAfterAll { override def beforeAll() { +new MockUp[MockClassForAlterRevertTests]() { Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3866: [CARBONDATA-3915] Correction in the documentation for spark-shell
CarbonDataQA1 commented on pull request #3866: URL: https://github.com/apache/carbondata/pull/3866#issuecomment-678974202 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2104/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI
Karan980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-678965944 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3862: [CARBONDATA-3933]Fix DDL/DML failures after table is created with column names having special characters like #,\,%
akashrn5 commented on a change in pull request #3862: URL: https://github.com/apache/carbondata/pull/3862#discussion_r475390842 ## File path: integration/spark/src/test/scala/org/apache/spark/carbondata/restructure/AlterTableRevertTestCase.scala ## @@ -30,6 +32,13 @@ import org.apache.carbondata.spark.exception.ProcessMetaDataException class AlterTableRevertTestCase extends QueryTest with BeforeAndAfterAll { override def beforeAll() { +new MockUp[MockClassForAlterRevertTests]() { Review comment: @QiangCai Since CarbonSessionCatalogUtil is a singleton object in scala, we cant mock its method directly, so i think this should be fine right for now? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure removed a comment on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
marchpure removed a comment on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-678949480 > @kumarvishal09 @QiangCai please help to review this PR Why can't I @Kejian-Li? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
marchpure commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-678949480 > @kumarvishal09 @QiangCai please help to review this PR Why can't I @Kejian-Li? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-678946442 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2102/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI
CarbonDataQA1 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-678945982 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3843/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] brijoobopanna commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
brijoobopanna commented on pull request #3887: URL: https://github.com/apache/carbondata/pull/3887#issuecomment-678945401 @kumarvishal09 @QiangCai please help to review this PR This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
ShreelekhyaG commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r475378412 ## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ## @@ -816,10 +816,13 @@ object CarbonDataRDDFactory { val partitionByRdd = keyRDD.partitionBy( new SegmentPartitioner(segmentIdIndex, segmentUpdateParallelism)) + val carbonSessionInfoBroadcast = sqlContext.sparkSession.sparkContext Review comment: In case of normal load flow, NewCarbonDataLoadRDD extends carbonRDD. While initializing, we get carbonSessionInfo from current thread and in compute of carbonRDD we set to the thread by `ThreadLocalSessionInfo.setCarbonSessionInfo(carbonSessionInfo)`. We can either do the same here or broadcast. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
ShreelekhyaG commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r475377552 ## File path: integration/spark/src/test/resources/badrecords/invalidTimeStampRange.csv ## @@ -0,0 +1,2 @@ +ID,date,starttime,country,name,phonetype,serialname,salary Review comment: ok, made changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
ShreelekhyaG commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r475377258 ## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ## @@ -435,19 +436,48 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue, private static Object parseTimestamp(String dimensionValue, String dateFormat) { Date dateToStr; -DateFormat dateFormatter; +DateFormat dateFormatter = null; try { if (null != dateFormat && !dateFormat.trim().isEmpty()) { dateFormatter = new SimpleDateFormat(dateFormat); -dateFormatter.setLenient(false); } else { dateFormatter = timestampFormatter.get(); } + dateFormatter.setLenient(false); dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + return validateTimeStampRange(dateToStr.getTime()); } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { +try { + LOGGER.info("Changing setLenient to true for TimeStamp: " + dimensionValue); Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
ShreelekhyaG commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r475377325 ## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ## @@ -435,19 +436,48 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue, private static Object parseTimestamp(String dimensionValue, String dateFormat) { Date dateToStr; -DateFormat dateFormatter; +DateFormat dateFormatter = null; try { if (null != dateFormat && !dateFormat.trim().isEmpty()) { dateFormatter = new SimpleDateFormat(dateFormat); -dateFormatter.setLenient(false); } else { dateFormatter = timestampFormatter.get(); } + dateFormatter.setLenient(false); dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + return validateTimeStampRange(dateToStr.getTime()); } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { +try { + LOGGER.info("Changing setLenient to true for TimeStamp: " + dimensionValue); + dateFormatter.setLenient(true); + dateToStr = dateFormatter.parse(dimensionValue); + LOGGER.info("Changing " + dimensionValue + " to " + dateToStr); + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + return validateTimeStampRange(dateToStr.getTime()); +} catch (ParseException ex) { + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + throw new NumberFormatException(ex.getMessage()); +} + } else { +throw new NumberFormatException(e.getMessage()); + } +} + } + + private static Long validateTimeStampRange(Long timeValue) { Review comment: done ## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ## @@ -435,19 +436,48 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue, private static Object parseTimestamp(String dimensionValue, String dateFormat) { Date dateToStr; -DateFormat dateFormatter; +DateFormat dateFormatter = null; try { if (null != dateFormat && !dateFormat.trim().isEmpty()) { dateFormatter = new SimpleDateFormat(dateFormat); -dateFormatter.setLenient(false); } else { dateFormatter = timestampFormatter.get(); } + dateFormatter.setLenient(false); dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + return validateTimeStampRange(dateToStr.getTime()); } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { +try { + LOGGER.info("Changing setLenient to true for TimeStamp: " + dimensionValue); + dateFormatter.setLenient(true); + dateToStr = dateFormatter.parse(dimensionValue); + LOGGER.info("Changing " + dimensionValue + " to " + dateToStr); + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + return validateTimeStampRange(dateToStr.getTime()); +} catch (ParseException ex) { + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + throw new NumberFormatException(ex.getMessage()); +} + } else { +throw new NumberFormatException(e.getMessage()); + } +} + } + + private static Long validateTimeStampRange(Long timeValue) { +long minValue = DateDirectDictionaryGenerator.MIN_VALUE; +long maxValue = DateDirectDictionaryGenerator.MAX_VALUE; +if (timeValue < minValue || timeValue > maxValue) { + if (LOGGER.isDebugEnabled()) { +LOGGER.debug("Value for timestamp type column is not in valid range."); + } + throw new NumberFormatException("Value for timestamp type column is not in valid range."); Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes
ShreelekhyaG commented on a change in pull request #3896: URL: https://github.com/apache/carbondata/pull/3896#discussion_r475377187 ## File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java ## @@ -435,19 +436,48 @@ public static Object getDataDataTypeForNoDictionaryColumn(String dimensionValue, private static Object parseTimestamp(String dimensionValue, String dateFormat) { Date dateToStr; -DateFormat dateFormatter; +DateFormat dateFormatter = null; try { if (null != dateFormat && !dateFormat.trim().isEmpty()) { dateFormatter = new SimpleDateFormat(dateFormat); -dateFormatter.setLenient(false); } else { dateFormatter = timestampFormatter.get(); } + dateFormatter.setLenient(false); dateToStr = dateFormatter.parse(dimensionValue); - return dateToStr.getTime(); + return validateTimeStampRange(dateToStr.getTime()); } catch (ParseException e) { - throw new NumberFormatException(e.getMessage()); + // If the parsing fails, try to parse again with setLenient to true if the property is set + if (CarbonProperties.getInstance().isSetLenientEnabled()) { +try { + LOGGER.info("Changing setLenient to true for TimeStamp: " + dimensionValue); + dateFormatter.setLenient(true); + dateToStr = dateFormatter.parse(dimensionValue); + LOGGER.info("Changing " + dimensionValue + " to " + dateToStr); + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + return validateTimeStampRange(dateToStr.getTime()); +} catch (ParseException ex) { + dateFormatter.setLenient(false); + LOGGER.info("Changing setLenient back to false"); + throw new NumberFormatException(ex.getMessage()); +} + } else { +throw new NumberFormatException(e.getMessage()); + } +} + } + + private static Long validateTimeStampRange(Long timeValue) { +long minValue = DateDirectDictionaryGenerator.MIN_VALUE; +long maxValue = DateDirectDictionaryGenerator.MAX_VALUE; +if (timeValue < minValue || timeValue > maxValue) { + if (LOGGER.isDebugEnabled()) { Review comment: ok, removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akkio-97 commented on a change in pull request #3866: [CARBONDATA-3915] Correction in the documentation for spark-shell
akkio-97 commented on a change in pull request #3866: URL: https://github.com/apache/carbondata/pull/3866#discussion_r475362156 ## File path: docs/hive-guide.md ## @@ -52,16 +52,11 @@ $HADOOP_HOME/bin/hadoop fs -put sample.csv /sample.csv ``` import org.apache.spark.sql.SparkSession import org.apache.spark.sql.CarbonSession._ -val rootPath = "hdfs:///user/hadoop/carbon" -val storeLocation = s"$rootPath/store" -val warehouse = s"$rootPath/warehouse" -val metaStoreDB = s"$rootPath/metastore_db" - -val carbon = SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir", warehouse).config(org.apache.carbondata.core.constants.CarbonCommonConstants.STORE_LOCATION, storeLocation).getOrCreateCarbonSession(storeLocation, metaStoreDB) - -carbon.sql("create table hive_carbon(id int, name string, scale decimal, country string, salary double) STORED AS carbondata") -carbon.sql("LOAD DATA INPATH '/sample.csv' INTO TABLE hive_carbon") -scala>carbon.sql("SELECT * FROM hive_carbon").show() +val newSpark = SparkSession.builder().config(sc.getConf).enableHiveSupport.config("spark.sql.extensions","org.apache.spark.sql.CarbonExtensions").getOrCreate() +newSpark.sql("drop table if exists hive_carbon").show +newSpark.sql("create table hive_carbon(id int, name string, scale decimal, country string, salary double) STORED AS carbondata").show +newSpark.sql("/sample.csv INTO TABLE hive_carbon").show Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org