[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-26 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r477080590



##
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##
@@ -444,9 +446,38 @@ private static Object parseTimestamp(String 
dimensionValue, String dateFormat) {
 dateFormatter = timestampFormatter.get();
   }
   dateToStr = dateFormatter.parse(dimensionValue);
-  return dateToStr.getTime();
+  timeValue = dateToStr.getTime();
+  validateTimeStampRange(timeValue);
+  return timeValue;
 } catch (ParseException e) {
-  throw new NumberFormatException(e.getMessage());
+  // If the parsing fails, try to parse again with setLenient to true if 
the property is set
+  if (CarbonProperties.getInstance().isSetLenientEnabled()) {
+try {
+  dateFormatter.setLenient(true);
+  dateToStr = dateFormatter.parse(dimensionValue);
+  dateFormatter.setLenient(false);
+  timeValue = dateToStr.getTime();
+  validateTimeStampRange(timeValue);
+  LOGGER.info("Parsed data with lenience as true, setting back to 
default mode");
+  return timeValue;
+} catch (ParseException ex) {
+  dateFormatter.setLenient(false);

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-26 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r477080281



##
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##
@@ -444,9 +446,38 @@ private static Object parseTimestamp(String 
dimensionValue, String dateFormat) {
 dateFormatter = timestampFormatter.get();
   }
   dateToStr = dateFormatter.parse(dimensionValue);
-  return dateToStr.getTime();
+  timeValue = dateToStr.getTime();
+  validateTimeStampRange(timeValue);
+  return timeValue;
 } catch (ParseException e) {
-  throw new NumberFormatException(e.getMessage());
+  // If the parsing fails, try to parse again with setLenient to true if 
the property is set
+  if (CarbonProperties.getInstance().isSetLenientEnabled()) {
+try {
+  dateFormatter.setLenient(true);

Review comment:
   added an example.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-25 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r477059876



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/util/BadRecordUtil.scala
##
@@ -68,4 +71,32 @@ object BadRecordUtil {
 badRecordLocation = badRecordLocation + "/" + dbName + "/" + tableName
 
FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(badRecordLocation))
   }
+
+  def createCSV(rows: ListBuffer[Array[String]], csvPath: String): Unit = {
+val out = new BufferedWriter(new FileWriter(csvPath))

Review comment:
   could not find proper try-with-resource in scala. keeping it the same.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-25 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r477059942



##
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##
@@ -444,9 +446,38 @@ private static Object parseTimestamp(String 
dimensionValue, String dateFormat) {
 dateFormatter = timestampFormatter.get();
   }
   dateToStr = dateFormatter.parse(dimensionValue);
-  return dateToStr.getTime();
+  timeValue = dateToStr.getTime();
+  validateTimeStampRange(timeValue);
+  return timeValue;
 } catch (ParseException e) {
-  throw new NumberFormatException(e.getMessage());
+  // If the parsing fails, try to parse again with setLenient to true if 
the property is set
+  if (CarbonProperties.getInstance().isSetLenientEnabled()) {
+try {
+  dateFormatter.setLenient(true);
+  dateToStr = dateFormatter.parse(dimensionValue);
+  dateFormatter.setLenient(false);
+  timeValue = dateToStr.getTime();
+  validateTimeStampRange(timeValue);
+  LOGGER.info("Parsed data with lenience as true, setting back to 
default mode");
+  return timeValue;
+} catch (ParseException ex) {
+  dateFormatter.setLenient(false);
+  LOGGER.info("Failed to parse data with lenience as true, setting 
back to default mode");
+  throw new NumberFormatException(ex.getMessage());
+}
+  } else {
+throw new NumberFormatException(e.getMessage());
+  }
+}
+  }
+
+  private static void validateTimeStampRange(Long timeValue) {
+if (timeValue < DateDirectDictionaryGenerator.MIN_VALUE
+|| timeValue > DateDirectDictionaryGenerator.MAX_VALUE) {
+  throw new NumberFormatException(
+  "timestamp column data value: " + timeValue + "is not in valid " + 
"range of: "

Review comment:
   ok changed

##
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##
@@ -1592,6 +1592,15 @@ private CarbonCommonConstants() {
 
   public static final String CARBON_LUCENE_INDEX_STOP_WORDS_DEFAULT = "false";
 
+  // Property to enable parsing the timestamp/date data with setLenient = true 
in load
+  // flow if it fails with parse invalid timestamp data. (example: 1941-03-15 
00:00:00
+  // is valid time in Asia/Calcutta zone and is invalid and will fail to parse 
in Asia/Shanghai
+  // zone as DST is observed and clocks were turned forward 1 hour to 
1941-03-15 01:00:00)
+  @CarbonProperty(dynamicConfigurable = true) public static final String

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-25 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r477059456



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/util/BadRecordUtil.scala
##
@@ -68,4 +71,32 @@ object BadRecordUtil {
 badRecordLocation = badRecordLocation + "/" + dbName + "/" + tableName
 
FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(badRecordLocation))
   }
+
+  def createCSV(rows: ListBuffer[Array[String]], csvPath: String): Unit = {
+val out = new BufferedWriter(new FileWriter(csvPath))
+val writer: CSVWriter = new CSVWriter(out)
+try {
+  for (row <- rows) {
+writer.writeNext(row)
+  }
+}
+catch {
+  case e: Exception =>
+Assert.fail(e.getMessage)
+}
+finally {
+  out.close()
+  writer.close()
+}
+  }
+
+  def deleteCSVFile(csvPath: String): Unit = {

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-25 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476269555



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##
@@ -306,7 +315,107 @@ class TestLoadDataWithDiffTimestampFormat extends 
QueryTest with BeforeAndAfterA
 }
   }
 
+  test("test load, update data with setlenient carbon property for daylight " +
+   "saving time from different timezone") {
+CarbonProperties.getInstance().addProperty(
+  CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE, "true")
+val defaultTimeZone = TimeZone.getDefault
+TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+sql("DROP TABLE IF EXISTS test_time")
+sql(
+  """
+   CREATE TABLE IF NOT EXISTS test_time
+   (ID Int, date Date, time Timestamp)
+   STORED AS carbondata TBLPROPERTIES('dateformat'='-MM-dd',
+   'timestampformat'='-MM-dd HH:mm')
+""")
+sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time")
+sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+checkAnswer(
+  sql("SELECT time FROM test_time WHERE ID = 1"),
+  Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+)
+checkAnswer(
+  sql("SELECT time FROM test_time WHERE ID = 11"),
+  Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+)
+checkAnswer(
+  sql("SELECT time FROM test_time WHERE ID = 2"),
+  Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+)
+sql("DROP TABLE test_time")
+TimeZone.setDefault(defaultTimeZone)
+CarbonProperties.getInstance().removeProperty(
+  CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE)
+  }
+
+  test("test load, update data with setlenient session level property for 
daylight " +
+   "saving time from different timezone") {
+sql("set carbon.load.setlenient.enable = true")
+val defaultTimeZone = TimeZone.getDefault
+TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+sql("DROP TABLE IF EXISTS test_time")
+sql(
+  """
+   CREATE TABLE IF NOT EXISTS test_time
+   (ID Int, date Date, time Timestamp)
+   STORED AS carbondata TBLPROPERTIES('dateformat'='-MM-dd',
+   'timestampformat'='-MM-dd HH:mm')
+""")
+sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time")
+sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+checkAnswer(
+  sql("SELECT time FROM test_time WHERE ID = 1"),
+  Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+)
+checkAnswer(
+  sql("SELECT time FROM test_time WHERE ID = 11"),
+  Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+)
+checkAnswer(
+  sql("SELECT time FROM test_time WHERE ID = 2"),
+  Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+)
+sql("DROP TABLE test_time")
+TimeZone.setDefault(defaultTimeZone)
+defaultConfig()
+sqlContext.sparkSession.conf.unset("carbon.load.setlenient.enable")
+  }
+
+  def generateCSVFile(): Unit = {
+val rows1 = new ListBuffer[Array[String]]
+rows1 += Array("ID", "date", "time")
+rows1 += Array("1", "1941-3-15", "1941-3-15 00:00:00")
+rows1 += Array("2", "2016-7-24", "2016-7-24 01:02:30")
+createCSV(rows1, csvPath1)
+  }
+
+  def createCSV(rows: ListBuffer[Array[String]], csvPath: String): Unit = {

Review comment:
   moved createCSV and deleteCSV to a test util class





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-25 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476244994



##
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##
@@ -1592,6 +1592,13 @@ private CarbonCommonConstants() {
 
   public static final String CARBON_LUCENE_INDEX_STOP_WORDS_DEFAULT = "false";
 
+  // Property to enable parsing the timestamp/date data with setLenient = true 
in load
+  // flow if it fails with parse invalid timestamp data.

Review comment:
   ok added example





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-25 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476243946



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
##
@@ -306,7 +315,107 @@ class TestLoadDataWithDiffTimestampFormat extends 
QueryTest with BeforeAndAfterA
 }
   }
 
+  test("test load, update data with setlenient carbon property for daylight " +
+   "saving time from different timezone") {
+CarbonProperties.getInstance().addProperty(
+  CarbonCommonConstants.CARBON_LOAD_SETLENIENT_ENABLE, "true")
+val defaultTimeZone = TimeZone.getDefault
+TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
+sql("DROP TABLE IF EXISTS test_time")
+sql(
+  """
+   CREATE TABLE IF NOT EXISTS test_time
+   (ID Int, date Date, time Timestamp)
+   STORED AS carbondata TBLPROPERTIES('dateformat'='-MM-dd',
+   'timestampformat'='-MM-dd HH:mm')
+""")
+sql(s" LOAD DATA LOCAL INPATH '$resourcesPath/differentZoneTimeStamp.csv' 
into table test_time")
+sql(s"insert into test_time select 11, '2016-7-24', '1941-3-15 00:00:00' ")
+sql("update test_time set (time) = ('1941-3-15 00:00:00') where ID='2'")
+checkAnswer(
+  sql("SELECT time FROM test_time WHERE ID = 1"),
+  Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+)
+checkAnswer(
+  sql("SELECT time FROM test_time WHERE ID = 11"),
+  Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+)
+checkAnswer(
+  sql("SELECT time FROM test_time WHERE ID = 2"),
+  Seq(Row(Timestamp.valueOf("1941-3-15 01:00:00")))
+)
+sql("DROP TABLE test_time")
+TimeZone.setDefault(defaultTimeZone)

Review comment:
   ok done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-25 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476244034



##
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##
@@ -435,18 +436,48 @@ public static Object 
getDataDataTypeForNoDictionaryColumn(String dimensionValue,
 
   private static Object parseTimestamp(String dimensionValue, String 
dateFormat) {
 Date dateToStr;
-DateFormat dateFormatter;
+DateFormat dateFormatter = null;
+long timeValue;
 try {
   if (null != dateFormat && !dateFormat.trim().isEmpty()) {
 dateFormatter = new SimpleDateFormat(dateFormat);
-dateFormatter.setLenient(false);
   } else {
 dateFormatter = timestampFormatter.get();
   }
+  dateFormatter.setLenient(false);
   dateToStr = dateFormatter.parse(dimensionValue);
-  return dateToStr.getTime();
+  timeValue = dateToStr.getTime();
+  validateTimeStampRange(timeValue);
+  return timeValue;
 } catch (ParseException e) {
-  throw new NumberFormatException(e.getMessage());
+  // If the parsing fails, try to parse again with setLenient to true if 
the property is set
+  if (CarbonProperties.getInstance().isSetLenientEnabled()) {
+try {
+  LOGGER.info("Changing setLenient to true");
+  dateFormatter.setLenient(true);
+  dateToStr = dateFormatter.parse(dimensionValue);
+  dateFormatter.setLenient(false);
+  LOGGER.info("Changing setLenient back to false");
+  timeValue = dateToStr.getTime();
+  validateTimeStampRange(timeValue);
+  return timeValue;
+} catch (ParseException ex) {
+  dateFormatter.setLenient(false);
+  LOGGER.info("Changing setLenient back to false");
+  throw new NumberFormatException(ex.getMessage());
+}
+  } else {
+throw new NumberFormatException(e.getMessage());
+  }
+}
+  }
+
+  private static void validateTimeStampRange(Long timeValue) {
+if (timeValue < DateDirectDictionaryGenerator.MIN_VALUE
+|| timeValue > DateDirectDictionaryGenerator.MAX_VALUE) {
+  throw new NumberFormatException("timestamp column data is not in valid 
range of: "

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-25 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r476244302



##
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##
@@ -1592,6 +1592,13 @@ private CarbonCommonConstants() {
 
   public static final String CARBON_LUCENE_INDEX_STOP_WORDS_DEFAULT = "false";
 
+  // Property to enable parsing the timestamp/date data with setLenient = true 
in load
+  // flow if it fails with parse invalid timestamp data.
+  @CarbonProperty(dynamicConfigurable = true)
+  public static final String CARBON_LOAD_SETLENIENT_ENABLE = 
"carbon.load.setlenient.enable";

Review comment:
   okay





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-23 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r475378412



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##
@@ -816,10 +816,13 @@ object CarbonDataRDDFactory {
   val partitionByRdd = keyRDD.partitionBy(
 new SegmentPartitioner(segmentIdIndex, segmentUpdateParallelism))
 
+  val carbonSessionInfoBroadcast = sqlContext.sparkSession.sparkContext

Review comment:
   In case of normal load flow, NewCarbonDataLoadRDD extends carbonRDD. 
While initializing, we get carbonSessionInfo from current thread and in compute 
of carbonRDD we set to the thread by 
`ThreadLocalSessionInfo.setCarbonSessionInfo(carbonSessionInfo)`. We can either 
do the same here or broadcast.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-23 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r475377552



##
File path: 
integration/spark/src/test/resources/badrecords/invalidTimeStampRange.csv
##
@@ -0,0 +1,2 @@
+ID,date,starttime,country,name,phonetype,serialname,salary

Review comment:
   ok, made changes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-23 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r475377258



##
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##
@@ -435,19 +436,48 @@ public static Object 
getDataDataTypeForNoDictionaryColumn(String dimensionValue,
 
   private static Object parseTimestamp(String dimensionValue, String 
dateFormat) {
 Date dateToStr;
-DateFormat dateFormatter;
+DateFormat dateFormatter = null;
 try {
   if (null != dateFormat && !dateFormat.trim().isEmpty()) {
 dateFormatter = new SimpleDateFormat(dateFormat);
-dateFormatter.setLenient(false);
   } else {
 dateFormatter = timestampFormatter.get();
   }
+  dateFormatter.setLenient(false);
   dateToStr = dateFormatter.parse(dimensionValue);
-  return dateToStr.getTime();
+  return validateTimeStampRange(dateToStr.getTime());
 } catch (ParseException e) {
-  throw new NumberFormatException(e.getMessage());
+  // If the parsing fails, try to parse again with setLenient to true if 
the property is set
+  if (CarbonProperties.getInstance().isSetLenientEnabled()) {
+try {
+  LOGGER.info("Changing setLenient to true for TimeStamp: " + 
dimensionValue);

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-23 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r475377325



##
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##
@@ -435,19 +436,48 @@ public static Object 
getDataDataTypeForNoDictionaryColumn(String dimensionValue,
 
   private static Object parseTimestamp(String dimensionValue, String 
dateFormat) {
 Date dateToStr;
-DateFormat dateFormatter;
+DateFormat dateFormatter = null;
 try {
   if (null != dateFormat && !dateFormat.trim().isEmpty()) {
 dateFormatter = new SimpleDateFormat(dateFormat);
-dateFormatter.setLenient(false);
   } else {
 dateFormatter = timestampFormatter.get();
   }
+  dateFormatter.setLenient(false);
   dateToStr = dateFormatter.parse(dimensionValue);
-  return dateToStr.getTime();
+  return validateTimeStampRange(dateToStr.getTime());
 } catch (ParseException e) {
-  throw new NumberFormatException(e.getMessage());
+  // If the parsing fails, try to parse again with setLenient to true if 
the property is set
+  if (CarbonProperties.getInstance().isSetLenientEnabled()) {
+try {
+  LOGGER.info("Changing setLenient to true for TimeStamp: " + 
dimensionValue);
+  dateFormatter.setLenient(true);
+  dateToStr = dateFormatter.parse(dimensionValue);
+  LOGGER.info("Changing " + dimensionValue + " to " + dateToStr);
+  dateFormatter.setLenient(false);
+  LOGGER.info("Changing setLenient back to false");
+  return validateTimeStampRange(dateToStr.getTime());
+} catch (ParseException ex) {
+  dateFormatter.setLenient(false);
+  LOGGER.info("Changing setLenient back to false");
+  throw new NumberFormatException(ex.getMessage());
+}
+  } else {
+throw new NumberFormatException(e.getMessage());
+  }
+}
+  }
+
+  private static Long validateTimeStampRange(Long timeValue) {

Review comment:
   done

##
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##
@@ -435,19 +436,48 @@ public static Object 
getDataDataTypeForNoDictionaryColumn(String dimensionValue,
 
   private static Object parseTimestamp(String dimensionValue, String 
dateFormat) {
 Date dateToStr;
-DateFormat dateFormatter;
+DateFormat dateFormatter = null;
 try {
   if (null != dateFormat && !dateFormat.trim().isEmpty()) {
 dateFormatter = new SimpleDateFormat(dateFormat);
-dateFormatter.setLenient(false);
   } else {
 dateFormatter = timestampFormatter.get();
   }
+  dateFormatter.setLenient(false);
   dateToStr = dateFormatter.parse(dimensionValue);
-  return dateToStr.getTime();
+  return validateTimeStampRange(dateToStr.getTime());
 } catch (ParseException e) {
-  throw new NumberFormatException(e.getMessage());
+  // If the parsing fails, try to parse again with setLenient to true if 
the property is set
+  if (CarbonProperties.getInstance().isSetLenientEnabled()) {
+try {
+  LOGGER.info("Changing setLenient to true for TimeStamp: " + 
dimensionValue);
+  dateFormatter.setLenient(true);
+  dateToStr = dateFormatter.parse(dimensionValue);
+  LOGGER.info("Changing " + dimensionValue + " to " + dateToStr);
+  dateFormatter.setLenient(false);
+  LOGGER.info("Changing setLenient back to false");
+  return validateTimeStampRange(dateToStr.getTime());
+} catch (ParseException ex) {
+  dateFormatter.setLenient(false);
+  LOGGER.info("Changing setLenient back to false");
+  throw new NumberFormatException(ex.getMessage());
+}
+  } else {
+throw new NumberFormatException(e.getMessage());
+  }
+}
+  }
+
+  private static Long validateTimeStampRange(Long timeValue) {
+long minValue = DateDirectDictionaryGenerator.MIN_VALUE;
+long maxValue = DateDirectDictionaryGenerator.MAX_VALUE;
+if (timeValue < minValue || timeValue > maxValue) {
+  if (LOGGER.isDebugEnabled()) {
+LOGGER.debug("Value for timestamp type column is not in valid range.");
+  }
+  throw new NumberFormatException("Value for timestamp type column is not 
in valid range.");

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-23 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r475377187



##
File path: core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
##
@@ -435,19 +436,48 @@ public static Object 
getDataDataTypeForNoDictionaryColumn(String dimensionValue,
 
   private static Object parseTimestamp(String dimensionValue, String 
dateFormat) {
 Date dateToStr;
-DateFormat dateFormatter;
+DateFormat dateFormatter = null;
 try {
   if (null != dateFormat && !dateFormat.trim().isEmpty()) {
 dateFormatter = new SimpleDateFormat(dateFormat);
-dateFormatter.setLenient(false);
   } else {
 dateFormatter = timestampFormatter.get();
   }
+  dateFormatter.setLenient(false);
   dateToStr = dateFormatter.parse(dimensionValue);
-  return dateToStr.getTime();
+  return validateTimeStampRange(dateToStr.getTime());
 } catch (ParseException e) {
-  throw new NumberFormatException(e.getMessage());
+  // If the parsing fails, try to parse again with setLenient to true if 
the property is set
+  if (CarbonProperties.getInstance().isSetLenientEnabled()) {
+try {
+  LOGGER.info("Changing setLenient to true for TimeStamp: " + 
dimensionValue);
+  dateFormatter.setLenient(true);
+  dateToStr = dateFormatter.parse(dimensionValue);
+  LOGGER.info("Changing " + dimensionValue + " to " + dateToStr);
+  dateFormatter.setLenient(false);
+  LOGGER.info("Changing setLenient back to false");
+  return validateTimeStampRange(dateToStr.getTime());
+} catch (ParseException ex) {
+  dateFormatter.setLenient(false);
+  LOGGER.info("Changing setLenient back to false");
+  throw new NumberFormatException(ex.getMessage());
+}
+  } else {
+throw new NumberFormatException(e.getMessage());
+  }
+}
+  }
+
+  private static Long validateTimeStampRange(Long timeValue) {
+long minValue = DateDirectDictionaryGenerator.MIN_VALUE;
+long maxValue = DateDirectDictionaryGenerator.MAX_VALUE;
+if (timeValue < minValue || timeValue > maxValue) {
+  if (LOGGER.isDebugEnabled()) {

Review comment:
   ok, removed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-23 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r475346002



##
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##
@@ -1592,6 +1592,13 @@ private CarbonCommonConstants() {
 
   public static final String CARBON_LUCENE_INDEX_STOP_WORDS_DEFAULT = "false";
 
+  // Property to enable parsing the data with setLenient = true in load flow 
if it fails with

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ShreelekhyaG commented on a change in pull request #3896: [CARBONDATA-3955] Fix load failures due to daylight saving time changes

2020-08-21 Thread GitBox


ShreelekhyaG commented on a change in pull request #3896:
URL: https://github.com/apache/carbondata/pull/3896#discussion_r474557660



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##
@@ -816,10 +816,20 @@ object CarbonDataRDDFactory {
   val partitionByRdd = keyRDD.partitionBy(
 new SegmentPartitioner(segmentIdIndex, segmentUpdateParallelism))
 
+  val conf = 
SparkSQLUtil.broadCastHadoopConf(sqlContext.sparkSession.sparkContext, 
hadoopConf)
+  val carbonSessionInfo: CarbonSessionInfo = {

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org