[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3890: [CARBONDATA-3952] After reset query not hitting MV

2020-08-24 Thread GitBox


Indhumathi27 commented on a change in pull request #3890:
URL: https://github.com/apache/carbondata/pull/3890#discussion_r476187983



##
File path: 
integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/SetParameterTestCase.scala
##
@@ -252,6 +252,27 @@ class SetParameterTestCase extends QueryTest with 
BeforeAndAfterAll {
 sql("RESET")
   }
 
+  test("TC_014-test mv after reset properties") {
+sql("drop table if exists maintable")
+sql("drop MATERIALIZED VIEW if exists mv1")
+sql("CREATE TABLE maintable(empno int,empname string,projectcode int, 
projectjoindate " +
+"Timestamp, projectenddate date,salary double) STORED AS carbondata")
+sql("CREATE MATERIALIZED VIEW mv1 as select 
timeseries(projectenddate,'day'), sum" +
+"(projectcode) from maintable group by 
timeseries(projectenddate,'day')")
+sql("insert into maintable select 1000,'PURUJIT',00012,'2015-07-26 
12:07:28','2016-05-20'," +
+"15000.00")
+sql("insert into maintable select 1001,'PANKAJ',00010,'2015-07-26 
17:32:20','2016-05-20'," +
+"25000.00")
+sql("set carbon.input.segments.defualt.maintable=1")
+checkExistence(sql("EXPLAIN select timeseries(projectenddate,'day'), 
sum(projectcode) from " +

Review comment:
   Can use 'TestUtil.verifyMVHit' method to verify if query hits mv or not





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3890: [CARBONDATA-3952] After reset query not hitting MV

2020-08-24 Thread GitBox


Indhumathi27 commented on a change in pull request #3890:
URL: https://github.com/apache/carbondata/pull/3890#discussion_r476188040



##
File path: 
integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/SetParameterTestCase.scala
##
@@ -252,6 +252,27 @@ class SetParameterTestCase extends QueryTest with 
BeforeAndAfterAll {
 sql("RESET")
   }
 
+  test("TC_014-test mv after reset properties") {
+sql("drop table if exists maintable")
+sql("drop MATERIALIZED VIEW if exists mv1")
+sql("CREATE TABLE maintable(empno int,empname string,projectcode int, 
projectjoindate " +
+"Timestamp, projectenddate date,salary double) STORED AS carbondata")
+sql("CREATE MATERIALIZED VIEW mv1 as select 
timeseries(projectenddate,'day'), sum" +
+"(projectcode) from maintable group by 
timeseries(projectenddate,'day')")
+sql("insert into maintable select 1000,'PURUJIT',00012,'2015-07-26 
12:07:28','2016-05-20'," +
+"15000.00")
+sql("insert into maintable select 1001,'PANKAJ',00010,'2015-07-26 
17:32:20','2016-05-20'," +
+"25000.00")
+sql("set carbon.input.segments.defualt.maintable=1")
+checkExistence(sql("EXPLAIN select timeseries(projectenddate,'day'), 
sum(projectcode) from " +
+   "maintable group by timeseries(projectenddate,'day')"), 
true, "mv1")
+sql("reset")
+checkExistence(sql("EXPLAIN select timeseries(projectenddate,'day'), 
sum(projectcode) from " +

Review comment:
   same as above comment





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-24 Thread GitBox


Indhumathi27 commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476186302



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##
@@ -306,7 +315,107 @@ class TestLoadDataWithDiffTimestampFormat extends 
QueryTest with BeforeAndAfterA
 }
   }
 
+  test("test load, update data with setlenient carbon property for daylight " +
+   "saving time from different timezone") {
+CarbonProperties.getInstance().addProperty(
+  CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE, "true")
+val defaultTimeZone = TimeZone.getDefault
+TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+sql("DROP TABLE IF EXISTS test_time")
+sql(
+  """
+   CREATE TABLE IF NOT EXISTS test_time
+   (ID Int, date Date, time Timestamp)
+   STORED AS carbondata TBLPROPERTIES('dateformat'='-MM-dd',
+   'timestampformat'='-MM-dd HH:mm')
+""")
+sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time")
+sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+checkAnswer(
+  sql("SELECT time FROM test_time WHERE ID = 1"),
+  Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+)
+checkAnswer(
+  sql("SELECT time FROM test_time WHERE ID = 11"),
+  Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+)
+checkAnswer(
+  sql("SELECT time FROM test_time WHERE ID = 2"),
+  Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+)
+sql("DROP TABLE test_time")
+TimeZone.setDefault(defaultTimeZone)

Review comment:
   Can move this to afterAll, as if testcase fails, TimeZone will be left 
Unset to default





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-24 Thread GitBox


Indhumathi27 commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476184514



##
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##
@@ -435,18 +436,48 @@ public static Object 
getDataDataTypeForNoDictionaryColumn(String dimensionValue,
 
   private static Object parseTimestamp(String dimensionValue, String 
dateFormat) {
 Date dateToStr;
-DateFormat dateFormatter;
+DateFormat dateFormatter = null;
+long timeValue;
 try {
   if (null != dateFormat && !dateFormat.trim().isEmpty()) {
 dateFormatter = new SimpleDateFormat(dateFormat);
-dateFormatter.setLenient(false);
   } else {
 dateFormatter = timestampFormatter.get();
   }
+  dateFormatter.setLenient(false);
   dateToStr = dateFormatter.parse(dimensionValue);
-  return dateToStr.getTime();
+  timeValue = dateToStr.getTime();
+  validateTimeStampRange(timeValue);
+  return timeValue;
 } catch (ParseException e) {
-  throw new NumberFormatException(e.getMessage());
+  // If the parsing fails, try to parse again with setLenient to true if 
the property is set
+  if (CarbonProperties.getInstance().isSetLenientEnabled()) {
+try {
+  LOGGER.info("Changing setLenient to true");
+  dateFormatter.setLenient(true);
+  dateToStr = dateFormatter.parse(dimensionValue);
+  dateFormatter.setLenient(false);
+  LOGGER.info("Changing setLenient back to false");
+  timeValue = dateToStr.getTime();
+  validateTimeStampRange(timeValue);
+  return timeValue;
+} catch (ParseException ex) {
+  dateFormatter.setLenient(false);
+  LOGGER.info("Changing setLenient back to false");
+  throw new NumberFormatException(ex.getMessage());
+}
+  } else {
+throw new NumberFormatException(e.getMessage());
+  }
+}
+  }
+
+  private static void validateTimeStampRange(Long timeValue) {
+if (timeValue < DateDirectDictionaryGenerator.MIN_VALUE
+|| timeValue > DateDirectDictionaryGenerator.MAX_VALUE) {
+  throw new NumberFormatException("timestamp column data is not in valid 
range of: "

Review comment:
   Can print timeValue also in the log





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-24 Thread GitBox


Indhumathi27 commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476183853



##
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##
@@ -435,18 +436,48 @@ public static Object 
getDataDataTypeForNoDictionaryColumn(String dimensionValue,
 
   private static Object parseTimestamp(String dimensionValue, String 
dateFormat) {
 Date dateToStr;
-DateFormat dateFormatter;
+DateFormat dateFormatter = null;
+long timeValue;
 try {
   if (null != dateFormat && !dateFormat.trim().isEmpty()) {
 dateFormatter = new SimpleDateFormat(dateFormat);
-dateFormatter.setLenient(false);

Review comment:
   I think you dont have to move this line down. as setLenient for 
timestampFormatter variable is handled during initialisation





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Karan-c980 commented on pull request #3876: TestingCI

2020-08-24 Thread GitBox


Karan-c980 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679683246


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-24 Thread GitBox


Indhumathi27 commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476183063



##
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##
@@ -1592,6 +1592,13 @@ private CarbonCommonConstants() {
 
   public static final String CARBON_LUCENE_INDEX_STOP_WORDS_DEFAULT = "false";
 
+  // Property to enable parsing the timestamp/date data with setLenient = true 
in load
+  // flow if it fails with parse invalid timestamp data.
+  @CarbonProperty(dynamicConfigurable = true)
+  public static final String CARBON_LOAD_SETLENIENT_ENABLE = 
"carbon.load.setlenient.enable";

Review comment:
   ```suggestion
 public static final String CARBON_LOAD_TIMESTAMP_SETLENIENT_ENABLE = 
"carbon.load.setlenient.enable";
   ```
   as this property is specific to timestamp





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] dependabot[bot] commented on pull request #3456: Bump solr.version from 6.3.0 to 8.3.0 in /datamap/lucene

2020-08-24 Thread GitBox


dependabot[bot] commented on pull request #3456:
URL: https://github.com/apache/carbondata/pull/3456#issuecomment-679464079


   Dependabot tried to update this pull request, but something went wrong. 
We're looking into it, but in the meantime you can retry the update by 
commenting `@dependabot rebase`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] dependabot[bot] commented on pull request #3447: Bump dep.jackson.version from 2.6.5 to 2.10.1 in /store/sdk

2020-08-24 Thread GitBox


dependabot[bot] commented on pull request #3447:
URL: https://github.com/apache/carbondata/pull/3447#issuecomment-679464062


   Dependabot tried to update this pull request, but something went wrong. 
We're looking into it, but in the meantime you can retry the update by 
commenting `@dependabot rebase`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] asfgit closed pull request #3891: [CARBONDATA-3889] Enable scala check style

2020-08-24 Thread GitBox


asfgit closed pull request #3891:
URL: https://github.com/apache/carbondata/pull/3891


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] xubo245 commented on pull request #3891: [CARBONDATA-3889] Enable scala check style

2020-08-24 Thread GitBox


xubo245 commented on pull request #3891:
URL: https://github.com/apache/carbondata/pull/3891#issuecomment-679459877







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] xubo245 removed a comment on pull request #3891: [CARBONDATA-3889] Enable scala check style

2020-08-24 Thread GitBox


xubo245 removed a comment on pull request #3891:
URL: https://github.com/apache/carbondata/pull/3891#issuecomment-678023289







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on pull request #3897: [CARBONDATA-3958] Fix CDC merge tasks can't finish issue

2020-08-24 Thread GitBox


QiangCai commented on pull request #3897:
URL: https://github.com/apache/carbondata/pull/3897#issuecomment-679445110


   @ravipesala @akashrn5 we can merge it also. In the future,  other cases can 
use CarbonTableOutputFormat and this issue will not happen.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-24 Thread GitBox


CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679357060


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3853/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-24 Thread GitBox


CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679350328


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2112/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-3915) Correction in the documentation for spark-shell

2020-08-24 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-3915.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> Correction in the documentation for spark-shell
> ---
>
> Key: CARBONDATA-3915
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3915
> Project: CarbonData
>  Issue Type: Bug
>  Components: hive-integration
>Affects Versions: 2.0.0
> Environment: 3 node ANT cluster one track.
>Reporter: Prasanna Ravichandran
>Priority: Minor
> Fix For: 2.1.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Spark-Shell program is not working, which is given in the 
> [https://github.com/apache/carbondata/blob/master/docs/hive-guide.md]
>  
> Working program is given as below:
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.CarbonSession._
>  val newSpark = 
> SparkSession.builder().config(sc.getConf).enableHiveSupport.config("spark.sql.extensions","org.apache.spark.sql.CarbonExtensions").getOrCreate()
>  newSpark.sql("drop table if exists hive_carbon").show
> newSpark.sql("create table hive_carbon(id int, name string, scale decimal, 
> country string, salary double) STORED AS carbondata").show
> newSpark.sql("LOAD DATA INPATH 
> 'hdfs://hacluster/user/prasanna/samplehive.csv' INTO TABLE hive_carbon").show
> newSpark.sql("SELECT * FROM hive_carbon").show()
> so could update the above working program in the 
> [https://github.com/apache/carbondata/blob/master/docs/hive-guide.md] page, 
> under the "Start Spark shell by running the following command in the Spark 
> directory" section.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] asfgit closed pull request #3866: [CARBONDATA-3915] Correction in the documentation for spark-shell

2020-08-24 Thread GitBox


asfgit closed pull request #3866:
URL: https://github.com/apache/carbondata/pull/3866


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #3866: [CARBONDATA-3915] Correction in the documentation for spark-shell

2020-08-24 Thread GitBox


kunal642 commented on pull request #3866:
URL: https://github.com/apache/carbondata/pull/3866#issuecomment-679235346


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-24 Thread GitBox


CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679216716


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3852/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-24 Thread GitBox


akashrn5 commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r475716479



##
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##
@@ -435,18 +436,48 @@ public static Object 
getDataDataTypeForNoDictionaryColumn(String dimensionValue,
 
   private static Object parseTimestamp(String dimensionValue, String 
dateFormat) {
 Date dateToStr;
-DateFormat dateFormatter;
+DateFormat dateFormatter = null;
+long timeValue;
 try {
   if (null != dateFormat && !dateFormat.trim().isEmpty()) {
 dateFormatter = new SimpleDateFormat(dateFormat);
-dateFormatter.setLenient(false);
   } else {
 dateFormatter = timestampFormatter.get();
   }
+  dateFormatter.setLenient(false);
   dateToStr = dateFormatter.parse(dimensionValue);
-  return dateToStr.getTime();
+  timeValue = dateToStr.getTime();
+  validateTimeStampRange(timeValue);
+  return timeValue;
 } catch (ParseException e) {
-  throw new NumberFormatException(e.getMessage());
+  // If the parsing fails, try to parse again with setLenient to true if 
the property is set
+  if (CarbonProperties.getInstance().isSetLenientEnabled()) {
+try {
+  LOGGER.info("Changing setLenient to true");
+  dateFormatter.setLenient(true);
+  dateToStr = dateFormatter.parse(dimensionValue);
+  dateFormatter.setLenient(false);
+  LOGGER.info("Changing setLenient back to false");

Review comment:
   i think these logs not required here, may be you can add only one log 
say after line 462, saying you had set true as parsing failed for invalid data 
and parsing finished, something meaningful like this and in line 466, you can 
say `"failed to parse data with lenience as true, setting back to default"` 
like this , it will look clean and good

##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/badrecordloger/BadRecordActionTest.scala
##
@@ -270,6 +274,53 @@ class BadRecordActionTest extends QueryTest {
 }
   }
 
+  test("test bad record FAIL with invalid timestamp range") {
+val csvPath = s"$resourcesPath/badrecords/invalidTimeStampRange.csv"
+val rows1 = new ListBuffer[Array[String]]
+rows1 += Array("ID", "date", "time")
+rows1 += Array("1", "2016-7-24", "342016-7-24 01:02:30")
+createCSV(rows1, csvPath)
+sql("DROP TABLE IF EXISTS test_time")
+sql("""
+   CREATE TABLE IF NOT EXISTS test_time
+   (ID Int, date Date, time Timestamp)
+   STORED AS carbondata TBLPROPERTIES('dateformat'='-MM-dd',
+   'timestampformat'='-MM-dd HH:mm:ss')
+""")
+val exception = intercept[Exception] {
+  sql(s" LOAD DATA LOCAL INPATH 
'$resourcesPath/badrecords/invalidTimeStampRange.csv' " +
+  s"into table test_time options ('bad_records_action'='fail')")
+}
+assert(exception.getMessage
+  .contains(
+"Data load failed due to bad record: The value with column name time 
and column data" +
+" type TIMESTAMP is not a valid TIMESTAMP type.Please enable bad 
record logger to know" +
+" the detail reason"))
+sql("DROP TABLE IF EXISTS test_time")
+deleteCSVFile(csvPath)
+  }
+
+  def createCSV(rows: ListBuffer[Array[String]], csvPath: String): Unit = {
+val out = new BufferedWriter(new FileWriter(csvPath))
+val writer: CSVWriter = new CSVWriter(out)
+
+for (row <- rows) {
+  writer.writeNext(row)
+}
+out.close()
+writer.close()
+  }
+
+  def deleteCSVFile(csvPath: String): Unit = {

Review comment:
   no need to create new method only for delete, call the file delete API 
directly in test case as not used in many test case.

##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##
@@ -306,7 +315,107 @@ class TestLoadDataWithDiffTimestampFormat extends 
QueryTest with BeforeAndAfterA
 }
   }
 
+  test("test load, update data with setlenient carbon property for daylight " +
+   "saving time from different timezone") {
+CarbonProperties.getInstance().addProperty(
+  CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE, "true")
+val defaultTimeZone = TimeZone.getDefault
+TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+sql("DROP TABLE IF EXISTS test_time")
+sql(
+  """
+   CREATE TABLE IF NOT EXISTS test_time
+   (ID Int, date Date, time Timestamp)
+   STORED AS carbondata TBLPROPERTIES('dateformat'='-MM-dd',
+   'timestampformat'='-MM-dd HH:mm')
+""")
+sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time")
+sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+sql("update test_time set (time) = 

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-24 Thread GitBox


CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679210475


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2111/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-24 Thread GitBox


akashrn5 commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r475346485



##
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##
@@ -1592,6 +1592,13 @@ private CarbonCommonConstants() {
 
   public static final String CARBON_LUCENE_INDEX_STOP_WORDS_DEFAULT = "false";
 
+  // Property to enable parsing the timestamp/date data with setLenient = true 
in load
+  // flow if it fails with parse invalid timestamp data.

Review comment:
   can give the timestamp example and mention DST, so that we will know in 
future why this property was added.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3897: [CARBONDATA-3958] Fix CDC merge tasks can't finish issue

2020-08-24 Thread GitBox


CarbonDataQA1 commented on pull request #3897:
URL: https://github.com/apache/carbondata/pull/3897#issuecomment-679159418


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3851/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3897: [CARBONDATA-3958] Fix CDC merge tasks can't finish issue

2020-08-24 Thread GitBox


CarbonDataQA1 commented on pull request #3897:
URL: https://github.com/apache/carbondata/pull/3897#issuecomment-679159237


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2110/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Karan-c980 commented on pull request #3876: TestingCI

2020-08-24 Thread GitBox


Karan-c980 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679141387


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-24 Thread GitBox


CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679136250


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2109/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-24 Thread GitBox


CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679135243


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3850/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #3897: [CARBONDATA-3958] Fix CDC merge tasks can't finish issue

2020-08-24 Thread GitBox


akashrn5 commented on pull request #3897:
URL: https://github.com/apache/carbondata/pull/3897#issuecomment-679123761


   @QiangCai as @ravipesala said since we are not using this, may be once after 
#3856 is merged, we can check and close this i think.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on pull request #3856: [CARBONDATA-3929]Improve CDC performance

2020-08-24 Thread GitBox


QiangCai commented on pull request #3856:
URL: https://github.com/apache/carbondata/pull/3856#issuecomment-679105065


   why don't use row_v1 format in the streaming module?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ravipesala commented on pull request #3897: [CARBONDATA-3958] Fix CDC merge tasks can't finish issue

2020-08-24 Thread GitBox


ravipesala commented on pull request #3897:
URL: https://github.com/apache/carbondata/pull/3897#issuecomment-679098812


   LGTM,
   Anyway we are not using it in CDC as per the PR 
https://github.com/apache/carbondata/pull/3856.
   This case not supposed to happen, it means source is not writing any data 
for more than 5 minutes. It should be some exceptional scenarios.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on pull request #3897: [CARBONDATA-3958] Fix CDC merge tasks can't finish issue

2020-08-24 Thread GitBox


QiangCai commented on pull request #3897:
URL: https://github.com/apache/carbondata/pull/3897#issuecomment-679096749


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ravipesala commented on a change in pull request #3856: [CARBONDATA-3929]Improve CDC performance

2020-08-24 Thread GitBox


ravipesala commented on a change in pull request #3856:
URL: https://github.com/apache/carbondata/pull/3856#discussion_r475560861



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/merge/CarbonMergeDataSetCommand.scala
##
@@ -194,29 +210,32 @@ case class CarbonMergeDataSetCommand(
 tuple._2.asJava)
 }
   }
-  Some(UpdateTableModel(true, trxMgr.getLatestTrx,
-executorErrors, tuple._2, true))
+  Some(UpdateTableModel(isUpdate = true, trxMgr.getLatestTrx,
+executorErrors, tuple._2, loadAsNewSegment = true))
 } else {
   None
 }
 
-CarbonInsertIntoWithDf(
-  databaseNameOp = Some(carbonTable.getDatabaseName),
+val dataFrame = loadDF.select(tableCols.map(col): _*)
+CarbonInsertIntoCommand(databaseNameOp = Some(carbonTable.getDatabaseName),
   tableName = carbonTable.getTableName,
-  options = Map("fileheader" -> header, "sort_scope" -> "nosort"),
+  options = Map("fileheader" -> header, "sort_scope" -> "no_sort"),

Review comment:
   I agree with @ajantha-bhat  we should use the target table sort scope. 
But this PR is only about performance we can cover this change in another PR 
with proper test





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ravipesala commented on pull request #3856: [CARBONDATA-3929]Improve CDC performance

2020-08-24 Thread GitBox


ravipesala commented on pull request #3856:
URL: https://github.com/apache/carbondata/pull/3856#issuecomment-679095323


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI

2020-08-24 Thread GitBox


Karan980 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679080452


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-24 Thread GitBox


CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679079380


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2108/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-24 Thread GitBox


CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679078760


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3849/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3894: [CARBONDATA-3957] Added property to enable disable the repair logic and provide a limit to number of segments in SILoadEvent

2020-08-24 Thread GitBox


kunal642 commented on a change in pull request #3894:
URL: https://github.com/apache/carbondata/pull/3894#discussion_r475533764



##
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##
@@ -2262,6 +2262,32 @@ private CarbonCommonConstants() {
   public static final int CARBON_INDEX_SERVER_WORKER_THREADS_DEFAULT =
   500;
 
+  /**
+   * Configured property to enable/disable load failed segments in SI table 
during
+   * load/insert command.
+   */
+  @CarbonProperty(dynamicConfigurable = true)
+  public static final String CARBON_LOAD_SI_REPAIR =  "carbon.load.si.repair";
+
+  /**
+   * Default value for load failed segments in SI table during
+   * load/insert command.
+   */
+  public static final String CARBON_LOAD_SI_REPAIR_DEFAULT = "true";
+
+  /**
+   * Property to give a limit to the number of segments that are reloaded in 
the
+   * SI table in the FailedSegments listener.
+   */
+  @CarbonProperty
+  public static final String CARBON_SI_REPAIR_LIMIT =  
"carbon.si.repair.limit";
+
+  /**
+   * Default value for the number of segments to be repaired during SI repair.
+   */
+  public static final String CARBON_SI_REPAIR_LIMIT_DEFAULT = "1";

Review comment:
   No need for a default value, if the user has not set then we can go with 
the existing logic of repairing all the segments





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3894: [CARBONDATA-3957] Added property to enable disable the repair logic and provide a limit to number of segments in SILoadEvent

2020-08-24 Thread GitBox


kunal642 commented on a change in pull request #3894:
URL: https://github.com/apache/carbondata/pull/3894#discussion_r475533961



##
File path: 
core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
##
@@ -2076,4 +2076,16 @@ public static boolean isAuditEnabled() {
   public static void setAuditEnabled(boolean enabled) {
 getInstance().addProperty(CarbonCommonConstants.CARBON_ENABLE_AUDIT, 
String.valueOf(enabled));
   }
+
+  public boolean isSIRepairEnabled() {
+String configuredValue = 
getProperty(CarbonCommonConstants.CARBON_LOAD_SI_REPAIR,
+CarbonCommonConstants.CARBON_LOAD_SI_REPAIR_DEFAULT);
+return Boolean.parseBoolean(configuredValue);
+  }
+
+  public int maxSIRepairLimit() {

Review comment:
   rename to "getMaxSIRepairLimit"





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3894: [CARBONDATA-3957] Added property to enable disable the repair logic and provide a limit to number of segments in SILoadEvent

2020-08-24 Thread GitBox


kunal642 commented on a change in pull request #3894:
URL: https://github.com/apache/carbondata/pull/3894#discussion_r475533562



##
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##
@@ -2262,6 +2262,32 @@ private CarbonCommonConstants() {
   public static final int CARBON_INDEX_SERVER_WORKER_THREADS_DEFAULT =
   500;
 
+  /**
+   * Configured property to enable/disable load failed segments in SI table 
during
+   * load/insert command.
+   */
+  @CarbonProperty(dynamicConfigurable = true)
+  public static final String CARBON_LOAD_SI_REPAIR =  "carbon.load.si.repair";
+
+  /**
+   * Default value for load failed segments in SI table during
+   * load/insert command.
+   */
+  public static final String CARBON_LOAD_SI_REPAIR_DEFAULT = "true";
+
+  /**
+   * Property to give a limit to the number of segments that are reloaded in 
the
+   * SI table in the FailedSegments listener.
+   */
+  @CarbonProperty
+  public static final String CARBON_SI_REPAIR_LIMIT =  
"carbon.si.repair.limit";

Review comment:
   I think its better to have these 2 properties at table level instead of 
global.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3866: [CARBONDATA-3915] Correction in the documentation for spark-shell

2020-08-24 Thread GitBox


CarbonDataQA1 commented on pull request #3866:
URL: https://github.com/apache/carbondata/pull/3866#issuecomment-679070825


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2107/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-3933) insert, desc throws error when the column name contains special character

2020-08-24 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-3933.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> insert, desc throws error when the column name contains special character
> -
>
> Key: CARBONDATA-3933
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3933
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Akash R Nilugal
>Priority: Minor
> Fix For: 2.1.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> insert, desc throws error when the column name contains special character
> sql("drop table if exists special_char")
> sql("create table special_char(`i#d` string, `nam(e` 

[GitHub] [carbondata] asfgit closed pull request #3862: [CARBONDATA-3933]Fix DDL/DML failures after table is created with column names having special characters like #,\,%

2020-08-24 Thread GitBox


asfgit closed pull request #3862:
URL: https://github.com/apache/carbondata/pull/3862


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3897: [CARBONDATA-3958] Fix CDC merge tasks can't finish issue

2020-08-24 Thread GitBox


CarbonDataQA1 commented on pull request #3897:
URL: https://github.com/apache/carbondata/pull/3897#issuecomment-679068984


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3847/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3897: [CARBONDATA-3958] Fix CDC merge tasks can't finish issue

2020-08-24 Thread GitBox


CarbonDataQA1 commented on pull request #3897:
URL: https://github.com/apache/carbondata/pull/3897#issuecomment-679068472


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2106/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3866: [CARBONDATA-3915] Correction in the documentation for spark-shell

2020-08-24 Thread GitBox


CarbonDataQA1 commented on pull request #3866:
URL: https://github.com/apache/carbondata/pull/3866#issuecomment-679067484


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3848/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #3862: [CARBONDATA-3933]Fix DDL/DML failures after table is created with column names having special characters like #,\,%

2020-08-24 Thread GitBox


kunal642 commented on pull request #3862:
URL: https://github.com/apache/carbondata/pull/3862#issuecomment-679065415


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-24 Thread GitBox


VenuReddy2103 commented on pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#issuecomment-679040734


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-24 Thread GitBox


VenuReddy2103 commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r475494371



##
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##
@@ -435,19 +436,48 @@ public static Object 
getDataDataTypeForNoDictionaryColumn(String dimensionValue,
 
   private static Object parseTimestamp(String dimensionValue, String 
dateFormat) {
 Date dateToStr;
-DateFormat dateFormatter;
+DateFormat dateFormatter = null;
 try {
   if (null != dateFormat && !dateFormat.trim().isEmpty()) {
 dateFormatter = new SimpleDateFormat(dateFormat);
-dateFormatter.setLenient(false);
   } else {
 dateFormatter = timestampFormatter.get();
   }
+  dateFormatter.setLenient(false);
   dateToStr = dateFormatter.parse(dimensionValue);
-  return dateToStr.getTime();
+  return validateTimeStampRange(dateToStr.getTime());
 } catch (ParseException e) {
-  throw new NumberFormatException(e.getMessage());
+  // If the parsing fails, try to parse again with setLenient to true if 
the property is set
+  if (CarbonProperties.getInstance().isSetLenientEnabled()) {
+try {
+  LOGGER.info("Changing setLenient to true for TimeStamp: " + 
dimensionValue);
+  dateFormatter.setLenient(true);
+  dateToStr = dateFormatter.parse(dimensionValue);
+  LOGGER.info("Changing " + dimensionValue + " to " + dateToStr);
+  dateFormatter.setLenient(false);
+  LOGGER.info("Changing setLenient back to false");
+  return validateTimeStampRange(dateToStr.getTime());
+} catch (ParseException ex) {
+  dateFormatter.setLenient(false);
+  LOGGER.info("Changing setLenient back to false");
+  throw new NumberFormatException(ex.getMessage());
+}
+  } else {
+throw new NumberFormatException(e.getMessage());
+  }
+}
+  }
+
+  private static Long validateTimeStampRange(Long timeValue) {
+long minValue = DateDirectDictionaryGenerator.MIN_VALUE;
+long maxValue = DateDirectDictionaryGenerator.MAX_VALUE;
+if (timeValue < minValue || timeValue > maxValue) {
+  if (LOGGER.isDebugEnabled()) {
+LOGGER.debug("Value for timestamp type column is not in valid range.");
+  }
+  throw new NumberFormatException("Value for timestamp type column is not 
in valid range.");

Review comment:
   I think, number format excepiton is ok. Also our exception message shows 
the correct error string that it is not in the valid range.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI

2020-08-24 Thread GitBox


Karan980 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679027809


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-24 Thread GitBox


CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679025574


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2105/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-24 Thread GitBox


CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-679024682


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3846/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akkio-97 commented on pull request #3866: [CARBONDATA-3915] Correction in the documentation for spark-shell

2020-08-24 Thread GitBox


akkio-97 commented on pull request #3866:
URL: https://github.com/apache/carbondata/pull/3866#issuecomment-679015969


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai opened a new pull request #3897: [CARBONDATA-3958] Fix CDC merge tasks can't finish issue

2020-08-24 Thread GitBox


QiangCai opened a new pull request #3897:
URL: https://github.com/apache/carbondata/pull/3897


### Why is this PR needed?
CDC merge tasks are blocked when data loading use 
CarbonTableOutputFormat.getRecordWriter method.
Because the poll method of the queue is time out in some cases.

### What changes were proposed in this PR?
If the output is not closed, it will poll a row batch in the loops till it 
gets a not null batch.
If the output is closed,  it will break the loop.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Karan980 commented on pull request #3893: [CARBONDATA-3959] Added new property to set the value of executor LRU cache size to 70% of the total executor memory in IndexServer, if

2020-08-24 Thread GitBox


Karan980 commented on pull request #3893:
URL: https://github.com/apache/carbondata/pull/3893#issuecomment-679002255


   @Indhumathi27 Done



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3959) In indexserver set executor lRU cache size to 70% of total executor memory

2020-08-24 Thread Karan (Jira)
Karan created CARBONDATA-3959:
-

 Summary: In indexserver set executor lRU cache size to 70% of 
total executor memory
 Key: CARBONDATA-3959
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3959
 Project: CarbonData
  Issue Type: Improvement
  Components: core, spark-integration
Affects Versions: 2.0.0
Reporter: Karan


Added new property to set the value of executor LRU cache size to 70% of the 
total executor memory in IndexServer, if executor LRU cache size is not 
configured.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3958) CDC Merge task can't finish

2020-08-24 Thread David Cai (Jira)
David Cai created CARBONDATA-3958:
-

 Summary: CDC Merge task can't finish
 Key: CARBONDATA-3958
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3958
 Project: CarbonData
  Issue Type: Bug
Reporter: David Cai


# The merge tasks take a long time and can't finish in some cases.
 # We find warning "This scenario should not happen" in log



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3866: [CARBONDATA-3915] Correction in the documentation for spark-shell

2020-08-24 Thread GitBox


CarbonDataQA1 commented on pull request #3866:
URL: https://github.com/apache/carbondata/pull/3866#issuecomment-678976496


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3845/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #3862: [CARBONDATA-3933]Fix DDL/DML failures after table is created with column names having special characters like #,\,%

2020-08-24 Thread GitBox


QiangCai commented on a change in pull request #3862:
URL: https://github.com/apache/carbondata/pull/3862#discussion_r475413812



##
File path: 
integration/spark/src/test/scala/org/apache/spark/carbondata/restructure/AlterTableRevertTestCase.scala
##
@@ -30,6 +32,13 @@ import 
org.apache.carbondata.spark.exception.ProcessMetaDataException
 class AlterTableRevertTestCase extends QueryTest with BeforeAndAfterAll {
 
   override def beforeAll() {
+new MockUp[MockClassForAlterRevertTests]() {

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3866: [CARBONDATA-3915] Correction in the documentation for spark-shell

2020-08-24 Thread GitBox


CarbonDataQA1 commented on pull request #3866:
URL: https://github.com/apache/carbondata/pull/3866#issuecomment-678974202


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2104/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Karan980 commented on pull request #3876: TestingCI

2020-08-24 Thread GitBox


Karan980 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-678965944


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3862: [CARBONDATA-3933]Fix DDL/DML failures after table is created with column names having special characters like #,\,%

2020-08-24 Thread GitBox


akashrn5 commented on a change in pull request #3862:
URL: https://github.com/apache/carbondata/pull/3862#discussion_r475390842



##
File path: 
integration/spark/src/test/scala/org/apache/spark/carbondata/restructure/AlterTableRevertTestCase.scala
##
@@ -30,6 +32,13 @@ import 
org.apache.carbondata.spark.exception.ProcessMetaDataException
 class AlterTableRevertTestCase extends QueryTest with BeforeAndAfterAll {
 
   override def beforeAll() {
+new MockUp[MockClassForAlterRevertTests]() {

Review comment:
   @QiangCai Since CarbonSessionCatalogUtil is a singleton object in scala, 
we cant mock its method directly, so i think this should be fine right for now?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure removed a comment on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-24 Thread GitBox


marchpure removed a comment on pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#issuecomment-678949480


   > @kumarvishal09 @QiangCai please help to review this PR
   
   Why can't I @Kejian-Li?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-24 Thread GitBox


marchpure commented on pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#issuecomment-678949480


   > @kumarvishal09 @QiangCai please help to review this PR
   
   Why can't I @Kejian-Li?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-24 Thread GitBox


CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-678946442


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2102/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3876: TestingCI

2020-08-24 Thread GitBox


CarbonDataQA1 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-678945982


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3843/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] brijoobopanna commented on pull request #3887: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-24 Thread GitBox


brijoobopanna commented on pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#issuecomment-678945401


   @kumarvishal09 @QiangCai please help to review this PR



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-24 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r475378412



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##
@@ -816,10 +816,13 @@ object CarbonDataRDDFactory {
   val partitionByRdd = keyRDD.partitionBy(
 new SegmentPartitioner(segmentIdIndex, segmentUpdateParallelism))
 
+  val carbonSessionInfoBroadcast = sqlContext.sparkSession.sparkContext

Review comment:
   In case of normal load flow, NewCarbonDataLoadRDD extends carbonRDD. 
While initializing, we get carbonSessionInfo from current thread and in compute 
of carbonRDD we set to the thread by 
`ThreadLocalSessionInfo.setCarbonSessionInfo(carbonSessionInfo)`. We can either 
do the same here or broadcast.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-24 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r475377552



##
File path: 
integration/spark/src/test/resources/badrecords/invalidTimeStampRange.csv
##
@@ -0,0 +1,2 @@
+ID,date,starttime,country,name,phonetype,serialname,salary

Review comment:
   ok, made changes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-24 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r475377258



##
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##
@@ -435,19 +436,48 @@ public static Object 
getDataDataTypeForNoDictionaryColumn(String dimensionValue,
 
   private static Object parseTimestamp(String dimensionValue, String 
dateFormat) {
 Date dateToStr;
-DateFormat dateFormatter;
+DateFormat dateFormatter = null;
 try {
   if (null != dateFormat && !dateFormat.trim().isEmpty()) {
 dateFormatter = new SimpleDateFormat(dateFormat);
-dateFormatter.setLenient(false);
   } else {
 dateFormatter = timestampFormatter.get();
   }
+  dateFormatter.setLenient(false);
   dateToStr = dateFormatter.parse(dimensionValue);
-  return dateToStr.getTime();
+  return validateTimeStampRange(dateToStr.getTime());
 } catch (ParseException e) {
-  throw new NumberFormatException(e.getMessage());
+  // If the parsing fails, try to parse again with setLenient to true if 
the property is set
+  if (CarbonProperties.getInstance().isSetLenientEnabled()) {
+try {
+  LOGGER.info("Changing setLenient to true for TimeStamp: " + 
dimensionValue);

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-24 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r475377325



##
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##
@@ -435,19 +436,48 @@ public static Object 
getDataDataTypeForNoDictionaryColumn(String dimensionValue,
 
   private static Object parseTimestamp(String dimensionValue, String 
dateFormat) {
 Date dateToStr;
-DateFormat dateFormatter;
+DateFormat dateFormatter = null;
 try {
   if (null != dateFormat && !dateFormat.trim().isEmpty()) {
 dateFormatter = new SimpleDateFormat(dateFormat);
-dateFormatter.setLenient(false);
   } else {
 dateFormatter = timestampFormatter.get();
   }
+  dateFormatter.setLenient(false);
   dateToStr = dateFormatter.parse(dimensionValue);
-  return dateToStr.getTime();
+  return validateTimeStampRange(dateToStr.getTime());
 } catch (ParseException e) {
-  throw new NumberFormatException(e.getMessage());
+  // If the parsing fails, try to parse again with setLenient to true if 
the property is set
+  if (CarbonProperties.getInstance().isSetLenientEnabled()) {
+try {
+  LOGGER.info("Changing setLenient to true for TimeStamp: " + 
dimensionValue);
+  dateFormatter.setLenient(true);
+  dateToStr = dateFormatter.parse(dimensionValue);
+  LOGGER.info("Changing " + dimensionValue + " to " + dateToStr);
+  dateFormatter.setLenient(false);
+  LOGGER.info("Changing setLenient back to false");
+  return validateTimeStampRange(dateToStr.getTime());
+} catch (ParseException ex) {
+  dateFormatter.setLenient(false);
+  LOGGER.info("Changing setLenient back to false");
+  throw new NumberFormatException(ex.getMessage());
+}
+  } else {
+throw new NumberFormatException(e.getMessage());
+  }
+}
+  }
+
+  private static Long validateTimeStampRange(Long timeValue) {

Review comment:
   done

##
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##
@@ -435,19 +436,48 @@ public static Object 
getDataDataTypeForNoDictionaryColumn(String dimensionValue,
 
   private static Object parseTimestamp(String dimensionValue, String 
dateFormat) {
 Date dateToStr;
-DateFormat dateFormatter;
+DateFormat dateFormatter = null;
 try {
   if (null != dateFormat && !dateFormat.trim().isEmpty()) {
 dateFormatter = new SimpleDateFormat(dateFormat);
-dateFormatter.setLenient(false);
   } else {
 dateFormatter = timestampFormatter.get();
   }
+  dateFormatter.setLenient(false);
   dateToStr = dateFormatter.parse(dimensionValue);
-  return dateToStr.getTime();
+  return validateTimeStampRange(dateToStr.getTime());
 } catch (ParseException e) {
-  throw new NumberFormatException(e.getMessage());
+  // If the parsing fails, try to parse again with setLenient to true if 
the property is set
+  if (CarbonProperties.getInstance().isSetLenientEnabled()) {
+try {
+  LOGGER.info("Changing setLenient to true for TimeStamp: " + 
dimensionValue);
+  dateFormatter.setLenient(true);
+  dateToStr = dateFormatter.parse(dimensionValue);
+  LOGGER.info("Changing " + dimensionValue + " to " + dateToStr);
+  dateFormatter.setLenient(false);
+  LOGGER.info("Changing setLenient back to false");
+  return validateTimeStampRange(dateToStr.getTime());
+} catch (ParseException ex) {
+  dateFormatter.setLenient(false);
+  LOGGER.info("Changing setLenient back to false");
+  throw new NumberFormatException(ex.getMessage());
+}
+  } else {
+throw new NumberFormatException(e.getMessage());
+  }
+}
+  }
+
+  private static Long validateTimeStampRange(Long timeValue) {
+long minValue = DateDirectDictionaryGenerator.MIN_VALUE;
+long maxValue = DateDirectDictionaryGenerator.MAX_VALUE;
+if (timeValue < minValue || timeValue > maxValue) {
+  if (LOGGER.isDebugEnabled()) {
+LOGGER.debug("Value for timestamp type column is not in valid range.");
+  }
+  throw new NumberFormatException("Value for timestamp type column is not 
in valid range.");

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-24 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r475377187



##
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##
@@ -435,19 +436,48 @@ public static Object 
getDataDataTypeForNoDictionaryColumn(String dimensionValue,
 
   private static Object parseTimestamp(String dimensionValue, String 
dateFormat) {
 Date dateToStr;
-DateFormat dateFormatter;
+DateFormat dateFormatter = null;
 try {
   if (null != dateFormat && !dateFormat.trim().isEmpty()) {
 dateFormatter = new SimpleDateFormat(dateFormat);
-dateFormatter.setLenient(false);
   } else {
 dateFormatter = timestampFormatter.get();
   }
+  dateFormatter.setLenient(false);
   dateToStr = dateFormatter.parse(dimensionValue);
-  return dateToStr.getTime();
+  return validateTimeStampRange(dateToStr.getTime());
 } catch (ParseException e) {
-  throw new NumberFormatException(e.getMessage());
+  // If the parsing fails, try to parse again with setLenient to true if 
the property is set
+  if (CarbonProperties.getInstance().isSetLenientEnabled()) {
+try {
+  LOGGER.info("Changing setLenient to true for TimeStamp: " + 
dimensionValue);
+  dateFormatter.setLenient(true);
+  dateToStr = dateFormatter.parse(dimensionValue);
+  LOGGER.info("Changing " + dimensionValue + " to " + dateToStr);
+  dateFormatter.setLenient(false);
+  LOGGER.info("Changing setLenient back to false");
+  return validateTimeStampRange(dateToStr.getTime());
+} catch (ParseException ex) {
+  dateFormatter.setLenient(false);
+  LOGGER.info("Changing setLenient back to false");
+  throw new NumberFormatException(ex.getMessage());
+}
+  } else {
+throw new NumberFormatException(e.getMessage());
+  }
+}
+  }
+
+  private static Long validateTimeStampRange(Long timeValue) {
+long minValue = DateDirectDictionaryGenerator.MIN_VALUE;
+long maxValue = DateDirectDictionaryGenerator.MAX_VALUE;
+if (timeValue < minValue || timeValue > maxValue) {
+  if (LOGGER.isDebugEnabled()) {

Review comment:
   ok, removed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akkio-97 commented on a change in pull request #3866: [CARBONDATA-3915] Correction in the documentation for spark-shell

2020-08-24 Thread GitBox


akkio-97 commented on a change in pull request #3866:
URL: https://github.com/apache/carbondata/pull/3866#discussion_r475362156



##
File path: docs/hive-guide.md
##
@@ -52,16 +52,11 @@ $HADOOP_HOME/bin/hadoop fs -put sample.csv /sample.csv
 ```
 import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.CarbonSession._
-val rootPath = "hdfs:///user/hadoop/carbon"
-val storeLocation = s"$rootPath/store"
-val warehouse = s"$rootPath/warehouse"
-val metaStoreDB = s"$rootPath/metastore_db"
-
-val carbon = 
SparkSession.builder().enableHiveSupport().config("spark.sql.warehouse.dir", 
warehouse).config(org.apache.carbondata.core.constants.CarbonCommonConstants.STORE_LOCATION,
 storeLocation).getOrCreateCarbonSession(storeLocation, metaStoreDB)
-
-carbon.sql("create table hive_carbon(id int, name string, scale decimal, 
country string, salary double) STORED AS carbondata")
-carbon.sql("LOAD DATA INPATH '/sample.csv' INTO TABLE 
hive_carbon")
-scala>carbon.sql("SELECT * FROM hive_carbon").show()
+val newSpark = 
SparkSession.builder().config(sc.getConf).enableHiveSupport.config("spark.sql.extensions","org.apache.spark.sql.CarbonExtensions").getOrCreate()
+newSpark.sql("drop table if exists hive_carbon").show
+newSpark.sql("create table hive_carbon(id int, name string, scale decimal, 
country string, salary double) STORED AS carbondata").show
+newSpark.sql("/sample.csv INTO TABLE hive_carbon").show

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org