[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-21 Thread GitBox


VenuReddy2103 commented on a change in pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#discussion_r474466612



##
File path: 
streaming/src/main/scala/org/apache/carbondata/streaming/parser/FieldConverter.scala
##
@@ -54,11 +54,12 @@ object FieldConverter {
   value match {
 case s: String => if (!isVarcharType && !isComplexType &&
   s.length > 
CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT) {
-  throw new IllegalArgumentException(stringLengthExceedErrorMsg +
-CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT + " characters")
-} else {
-  s
+  if (!CarbonProperties.getInstance.getProperty(CarbonCommonConstants
+.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT).toBoolean) {
+throw new 
IllegalArgumentException(CarbonCommonConstants.STRING_LENGTH_EXCEEDED_MESSAGE)

Review comment:
   `STRING_LENGTH_EXCEEDED_MESSAGE` has format specifiers("%s") in it. 
Instead of throwing with format specifiers here and then catching it in 
`carbonScalaUtil.getString()` and formatting it, suggest to throw the formatted 
string itself from here. Can cause security vulnerability issue if we for 
forget to catch such exception and modify it. Also suggest to add FMT prefix to 
string `STRING_LENGTH_EXCEEDED_MESSAGE`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-21 Thread GitBox


VenuReddy2103 commented on a change in pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#discussion_r474476530



##
File path: 
streaming/src/main/scala/org/apache/carbondata/streaming/parser/FieldConverter.scala
##
@@ -54,11 +54,12 @@ object FieldConverter {
   value match {
 case s: String => if (!isVarcharType && !isComplexType &&
   s.length > 
CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT) {
-  throw new IllegalArgumentException(stringLengthExceedErrorMsg +
-CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT + " characters")
-} else {
-  s
+  if (!CarbonProperties.getInstance.getProperty(CarbonCommonConstants

Review comment:
   
`CarbonProperties.getInstance().getProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT)`
 can return null. Please use `isBadRecordHandlingEnabledForInsert()` method 
instead of it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-21 Thread GitBox


VenuReddy2103 commented on a change in pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#discussion_r474466612



##
File path: 
streaming/src/main/scala/org/apache/carbondata/streaming/parser/FieldConverter.scala
##
@@ -54,11 +54,12 @@ object FieldConverter {
   value match {
 case s: String => if (!isVarcharType && !isComplexType &&
   s.length > 
CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT) {
-  throw new IllegalArgumentException(stringLengthExceedErrorMsg +
-CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT + " characters")
-} else {
-  s
+  if (!CarbonProperties.getInstance.getProperty(CarbonCommonConstants
+.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT).toBoolean) {
+throw new 
IllegalArgumentException(CarbonCommonConstants.STRING_LENGTH_EXCEEDED_MESSAGE)

Review comment:
   `STRING_LENGTH_EXCEEDED_MESSAGE` has format specifiers("%s") in it. It 
is directly passed `IllegalArgumentException` without formatting. Can cause 
security vulnerability issue. Also suggest to add FMT prefix to 
`STRING_LENGTH_EXCEEDED_MESSAGE` so that we don't miss such issues.
   
   `public static final String STRING_LENGTH_EXCEEDED_MESSAGE =
 "Record %s of column %s exceeded " + MAX_CHARS_PER_COLUMN_DEFAULT +
 " characters. Please consider long string data type.";`
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-21 Thread GitBox


VenuReddy2103 commented on a change in pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#discussion_r474446976



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/converter/impl/RowConverterImpl.java
##
@@ -130,9 +135,14 @@ public CarbonRow convert(CarbonRow row) throws 
CarbonDataLoadingException {
   }
   fieldConverters[i].convert(row, logHolder);
   if (!logHolder.isLogged() && logHolder.isBadRecordNotAdded()) {
-badRecordLogger.addBadRecordsToBuilder(row.getRawData(), 
logHolder.getReason());
+String reason = logHolder.getReason();

Review comment:
   If we had set the correct reason with row and column in 
`NonDictionaryFieldConverterImpl.convert()` instead of setting reason as 
`STRING_LENGTH_EXCEEDED_MESSAGE`, alone in it, we could have avoided changes in 
this file(RowConverterImpl) to fixup the the correct reason for 
`STRING_LENGTH_EXCEEDED_MESSAGE` again here at a common convert for all the 
columns of row.
   That looks better. right ? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-20 Thread GitBox


VenuReddy2103 commented on a change in pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#discussion_r474425224



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
##
@@ -145,47 +152,153 @@ class TestLoadDataGeneral extends QueryTest with 
BeforeAndAfterEach {
 sql("drop table if exists carbon_table")
   }
 
-  test("test insert / update with data more than 32000 characters") {
+  private def createTableAndLoadData (badRecordAction: String): Unit = {
+BadRecordUtil.cleanBadRecordPath("default", "longerthan32kchar")
+sql("CREATE TABLE longerthan32kchar(dim1 String, dim2 String, mes1 int) 
STORED AS carbondata")
+sql(s"LOAD DATA LOCAL INPATH '$testdata' into table longerThan32kChar 
OPTIONS('FILEHEADER'='dim1,dim2,mes1', " +
+  
s"'BAD_RECORDS_ACTION'='${badRecordAction}','BAD_RECORDS_LOGGER_ENABLE'='TRUE')")
+  }
+
+  test("test load / insert / update with data more than 32000 characters and 
bad record action as Redirect") {
+createTableAndLoadData("REDIRECT")
+var redirectCsvPath = BadRecordUtil
+  .getRedirectCsvPath("default", "longerthan32kchar", "0", "0")
+assert(BadRecordUtil.checkRedirectedCsvContentAvailableInSource(testdata, 
redirectCsvPath))
+CarbonProperties.getInstance()
+  
.addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT,
 "true")
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, 
"REDIRECT");
+sql(s"insert into longerthan32kchar values('33000', '$longChar', 4)")
+checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("ok", "hi", 
1), Row("itsok", "hello", 2)))
+redirectCsvPath = BadRecordUtil.getRedirectCsvPath("default", 
"longerthan32kchar", "1", "0")
+var redirectedFileLineList = FileUtils.readLines(redirectCsvPath)
+var iterator = redirectedFileLineList.iterator()
+while (iterator.hasNext) {
+  assert(iterator.next().equals("33000,"+longChar+",4"))
+}
+
+// Update strings of length greater than 32000
+sql(s"update longerthan32kchar set(longerthan32kchar.dim2)=('$longChar') " 
+
+  "where longerthan32kchar.mes1=1").show()
+checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("itsok", 
"hello", 2)))
+redirectCsvPath = BadRecordUtil.getRedirectCsvPath("default", 
"longerthan32kchar", "0", "1")
+redirectedFileLineList = FileUtils.readLines(redirectCsvPath)
+iterator = redirectedFileLineList.iterator()
+while (iterator.hasNext) {
+  assert(iterator.next().equals("ok,"+longChar+",1"))
+}
+CarbonProperties.getInstance()
+  
.addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT,
 "false")
+
+// Insert longer string without converter step will throw exception
+intercept[Exception] {
+  sql(s"insert into longerthan32kchar values('32000', '$longChar', 3)")
+}
+BadRecordUtil.cleanBadRecordPath("default", "longerthan32kchar")
+  }
+
+  test("test load / insert / update with data more than 32000 characters and 
bad record action as Force") {
+createTableAndLoadData("FORCE")
+checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("ok", "hi", 
1), Row("itsok", "hello", 2), Row("32123", null, 3)))
 CarbonProperties.getInstance()
   
.addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT,
 "true")
-val testdata =s"$resourcesPath/32000char.csv"
-sql("drop table if exists load32000chardata")
-sql("drop table if exists load32000chardata_dup")
-sql("CREATE TABLE load32000chardata(dim1 String, dim2 String, mes1 int) 
STORED AS carbondata")
-sql("CREATE TABLE load32000chardata_dup(dim1 String, dim2 String, mes1 
int) STORED AS carbondata")
-sql(s"LOAD DATA LOCAL INPATH '$testdata' into table load32000chardata 
OPTIONS('FILEHEADER'='dim1,dim2,mes1')")
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, "FORCE");
+sql(s"insert into longerthan32kchar values('33000', '$longChar', 4)")
+checkAnswer(sql("select * from longerthan32kchar"),
+  Seq(Row("ok", "hi", 1), Row("itsok", "hello", 2), Row("32123", null, 3), 
Row("33000", null, 4)))
+
+// Update strings of length greater than 32000
+sql(s"update longerthan32kchar set(longerthan32kchar.dim2)=('$longChar') " 
+
+  "where longerthan32kchar.mes1=1").show()
+checkAnswer(sql("select * from longerthan32kchar"),
+  Seq(Row("ok", null, 1), Row("itsok", "hello", 2), Row("32123", null, 3), 
Row("33000", null, 4)))
+CarbonProperties.getInstance()
+  
.addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT,
 "false")
+
+// Insert longer string without converter step will throw exception
 intercept[Exception] {
-  sql("insert into load32000chardata_dup select 
dim1,concat(load32000chardata.dim2,'

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-20 Thread GitBox


VenuReddy2103 commented on a change in pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#discussion_r474418423



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
##
@@ -145,47 +152,153 @@ class TestLoadDataGeneral extends QueryTest with 
BeforeAndAfterEach {
 sql("drop table if exists carbon_table")
   }
 
-  test("test insert / update with data more than 32000 characters") {
+  private def createTableAndLoadData (badRecordAction: String): Unit = {
+BadRecordUtil.cleanBadRecordPath("default", "longerthan32kchar")
+sql("CREATE TABLE longerthan32kchar(dim1 String, dim2 String, mes1 int) 
STORED AS carbondata")
+sql(s"LOAD DATA LOCAL INPATH '$testdata' into table longerThan32kChar 
OPTIONS('FILEHEADER'='dim1,dim2,mes1', " +
+  
s"'BAD_RECORDS_ACTION'='${badRecordAction}','BAD_RECORDS_LOGGER_ENABLE'='TRUE')")
+  }
+
+  test("test load / insert / update with data more than 32000 characters and 
bad record action as Redirect") {
+createTableAndLoadData("REDIRECT")
+var redirectCsvPath = BadRecordUtil
+  .getRedirectCsvPath("default", "longerthan32kchar", "0", "0")
+assert(BadRecordUtil.checkRedirectedCsvContentAvailableInSource(testdata, 
redirectCsvPath))
+CarbonProperties.getInstance()
+  
.addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT,
 "true")
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, 
"REDIRECT");
+sql(s"insert into longerthan32kchar values('33000', '$longChar', 4)")
+checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("ok", "hi", 
1), Row("itsok", "hello", 2)))
+redirectCsvPath = BadRecordUtil.getRedirectCsvPath("default", 
"longerthan32kchar", "1", "0")
+var redirectedFileLineList = FileUtils.readLines(redirectCsvPath)
+var iterator = redirectedFileLineList.iterator()
+while (iterator.hasNext) {
+  assert(iterator.next().equals("33000,"+longChar+",4"))
+}
+
+// Update strings of length greater than 32000
+sql(s"update longerthan32kchar set(longerthan32kchar.dim2)=('$longChar') " 
+
+  "where longerthan32kchar.mes1=1").show()
+checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("itsok", 
"hello", 2)))
+redirectCsvPath = BadRecordUtil.getRedirectCsvPath("default", 
"longerthan32kchar", "0", "1")
+redirectedFileLineList = FileUtils.readLines(redirectCsvPath)
+iterator = redirectedFileLineList.iterator()
+while (iterator.hasNext) {
+  assert(iterator.next().equals("ok,"+longChar+",1"))
+}
+CarbonProperties.getInstance()
+  
.addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT,
 "false")
+
+// Insert longer string without converter step will throw exception
+intercept[Exception] {

Review comment:
   After intercept, we can check for the string/substring we expect ? Same 
in below testcase also.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-20 Thread GitBox


VenuReddy2103 commented on a change in pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#discussion_r474418423



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
##
@@ -145,47 +152,153 @@ class TestLoadDataGeneral extends QueryTest with 
BeforeAndAfterEach {
 sql("drop table if exists carbon_table")
   }
 
-  test("test insert / update with data more than 32000 characters") {
+  private def createTableAndLoadData (badRecordAction: String): Unit = {
+BadRecordUtil.cleanBadRecordPath("default", "longerthan32kchar")
+sql("CREATE TABLE longerthan32kchar(dim1 String, dim2 String, mes1 int) 
STORED AS carbondata")
+sql(s"LOAD DATA LOCAL INPATH '$testdata' into table longerThan32kChar 
OPTIONS('FILEHEADER'='dim1,dim2,mes1', " +
+  
s"'BAD_RECORDS_ACTION'='${badRecordAction}','BAD_RECORDS_LOGGER_ENABLE'='TRUE')")
+  }
+
+  test("test load / insert / update with data more than 32000 characters and 
bad record action as Redirect") {
+createTableAndLoadData("REDIRECT")
+var redirectCsvPath = BadRecordUtil
+  .getRedirectCsvPath("default", "longerthan32kchar", "0", "0")
+assert(BadRecordUtil.checkRedirectedCsvContentAvailableInSource(testdata, 
redirectCsvPath))
+CarbonProperties.getInstance()
+  
.addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT,
 "true")
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, 
"REDIRECT");
+sql(s"insert into longerthan32kchar values('33000', '$longChar', 4)")
+checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("ok", "hi", 
1), Row("itsok", "hello", 2)))
+redirectCsvPath = BadRecordUtil.getRedirectCsvPath("default", 
"longerthan32kchar", "1", "0")
+var redirectedFileLineList = FileUtils.readLines(redirectCsvPath)
+var iterator = redirectedFileLineList.iterator()
+while (iterator.hasNext) {
+  assert(iterator.next().equals("33000,"+longChar+",4"))
+}
+
+// Update strings of length greater than 32000
+sql(s"update longerthan32kchar set(longerthan32kchar.dim2)=('$longChar') " 
+
+  "where longerthan32kchar.mes1=1").show()
+checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("itsok", 
"hello", 2)))
+redirectCsvPath = BadRecordUtil.getRedirectCsvPath("default", 
"longerthan32kchar", "0", "1")
+redirectedFileLineList = FileUtils.readLines(redirectCsvPath)
+iterator = redirectedFileLineList.iterator()
+while (iterator.hasNext) {
+  assert(iterator.next().equals("ok,"+longChar+",1"))
+}
+CarbonProperties.getInstance()
+  
.addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT,
 "false")
+
+// Insert longer string without converter step will throw exception
+intercept[Exception] {

Review comment:
   After intercept, we can check for the string/substring we expect ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-20 Thread GitBox


VenuReddy2103 commented on a change in pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#discussion_r474130519



##
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##
@@ -2468,4 +2468,10 @@ private CarbonCommonConstants() {
* index server temp folder aging period default value 3hours.
*/
   public static final String CARBON_INDEXSERVER_TEMPFOLDER_DELETETIME_DEFAULT 
= "1080";
+
+  public static final String STRING_LENGTH_EXCEEDED_MESSAGE =
+  "Record %s of column %s exceeded " + MAX_CHARS_PER_COLUMN_DEFAULT +
+  " characters. Please consider long string data type.";
+
+  public static final String FORCE_BAD_RECORD_ACTION = "FORCE";

Review comment:
   This string constant do not seem be used anywhere. Please remove it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org