[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3892: flink write carbon file to hdfs when file size is less than 1M,can't write

2020-08-16 Thread GitBox


CarbonDataQA1 commented on pull request #3892:
URL: https://github.com/apache/carbondata/pull/3892#issuecomment-674699373


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3741/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

2020-08-16 Thread GitBox


QiangCai commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r471254176



##
File path: 
core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
##
@@ -97,21 +97,31 @@ public void fillRequiredBlockData(RawBlockletColumnChunks 
blockChunkHolder)
 
   @Override
   public Object getDataBasedOnDataType(ByteBuffer dataBuffer) {
-Object[] data = fillData(dataBuffer);
+return getDataBasedOnDataType(dataBuffer, false);
+  }
+
+  @Override
+  public Object getDataBasedOnDataType(ByteBuffer dataBuffer, boolean 
getBytesData) {

Review comment:
   why not call getObjectDataBasedOnDataType?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

2020-08-16 Thread GitBox


Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r471242699



##
File path: 
core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
##
@@ -97,21 +97,31 @@ public void fillRequiredBlockData(RawBlockletColumnChunks 
blockChunkHolder)
 
   @Override
   public Object getDataBasedOnDataType(ByteBuffer dataBuffer) {
-Object[] data = fillData(dataBuffer);
+return getDataBasedOnDataType(dataBuffer, false);
+  }
+
+  @Override
+  public Object getDataBasedOnDataType(ByteBuffer dataBuffer, boolean 
getBytesData) {

Review comment:
   Already added a new method getObjectDataBasedOnDataType. this boolen is 
still required, as in filldata() method, complex children 
getDataBasedOnDataType will be called





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3952) After reset query not hitting MV

2020-08-16 Thread SHREELEKHYA GAMPA (Jira)
SHREELEKHYA GAMPA created CARBONDATA-3952:
-

 Summary: After reset query not hitting MV
 Key: CARBONDATA-3952
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3952
 Project: CarbonData
  Issue Type: Bug
Reporter: SHREELEKHYA GAMPA


After reset query not hitting MV



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] nihal0107 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-16 Thread GitBox


nihal0107 commented on a change in pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#discussion_r471232261



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/converter/impl/NonDictionaryFieldConverterImpl.java
##
@@ -82,21 +83,25 @@ public Object convert(Object value, BadRecordLogHolder 
logHolder)
   .getBytesBasedOnDataTypeForNoDictionaryColumn(dimensionValue, 
dataType, dateFormat);
   if (dataType == DataTypes.STRING
   && parsedValue.length > 
CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT) {
-throw new CarbonDataLoadingException(String.format(
-"Dataload failed, String size cannot exceed %d bytes,"
-+ " please consider long string data type",
-CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT));
+
logHolder.setReason(CarbonCommonConstants.STRING_LENGTH_EXCEEDED_MESSAGE);
+String badRecordAction = CarbonProperties.getInstance()
+.getProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION);
+if 
(badRecordAction.equalsIgnoreCase(CarbonCommonConstants.FORCE_BAD_RECORD_ACTION))
 {
+  parsedValue = getNullValue();
+}
   }
   return parsedValue;
 } else {
   Object parsedValue = DataTypeUtil
   .getDataDataTypeForNoDictionaryColumn(dimensionValue, dataType, 
dateFormat);
   if (dataType == DataTypes.STRING && parsedValue.toString().length()
   > CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT) {
-throw new CarbonDataLoadingException(String.format(
-"Dataload failed, String size cannot exceed %d bytes,"
-+ " please consider long string data type",
-CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT));
+
logHolder.setReason(CarbonCommonConstants.STRING_LENGTH_EXCEEDED_MESSAGE);
+if (CarbonProperties.getInstance()
+.getProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION)

Review comment:
   done and added test cases for all the actions.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-16 Thread GitBox


nihal0107 commented on a change in pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#discussion_r471232072



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/loading/converter/impl/NonDictionaryFieldConverterImpl.java
##
@@ -82,21 +83,25 @@ public Object convert(Object value, BadRecordLogHolder 
logHolder)
   .getBytesBasedOnDataTypeForNoDictionaryColumn(dimensionValue, 
dataType, dateFormat);
   if (dataType == DataTypes.STRING
   && parsedValue.length > 
CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT) {
-throw new CarbonDataLoadingException(String.format(
-"Dataload failed, String size cannot exceed %d bytes,"
-+ " please consider long string data type",
-CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT));
+
logHolder.setReason(CarbonCommonConstants.STRING_LENGTH_EXCEEDED_MESSAGE);
+String badRecordAction = CarbonProperties.getInstance()

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-16 Thread GitBox


nihal0107 commented on a change in pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#discussion_r471232002



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/datatypes/PrimitiveDataType.java
##
@@ -330,21 +331,35 @@ public void writeByteArray(Object input, DataOutputStream 
dataOutputStream,
 }
   }
 
+  private byte[] getNullForBytes(byte[] value) {
+String badRecordAction = CarbonProperties.getInstance()
+.getProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION);
+if 
(badRecordAction.equalsIgnoreCase(CarbonCommonConstants.FORCE_BAD_RECORD_ACTION))
 {
+  if (this.carbonDimension.getDataType() == DataTypes.STRING) {
+return CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY;
+  } else {
+return CarbonCommonConstants.EMPTY_BYTE_ARRAY;
+  }
+}
+return value;
+  }
+
   private void checkAndWriteByteArray(Object input, DataOutputStream 
dataOutputStream,
   BadRecordLogHolder logHolder, Boolean isWithoutConverter, String 
parsedValue, byte[] value)
   throws IOException {
 if (isWithoutConverter) {
   if (this.carbonDimension.getDataType() == DataTypes.STRING && input 
instanceof String
   && ((String)input).length() > 
CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT) {
-throw new CarbonDataLoadingException("Dataload failed, String size 
cannot exceed "
-+ CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT + " bytes");
+
logHolder.setReason(String.format(CarbonCommonConstants.STRING_LENGTH_EXCEEDED_MESSAGE,
+input.toString(), this.carbonDimension.getColName()));
+value = getNullForBytes(value);
   }
   updateValueToByteStream(dataOutputStream, value);
 } else {
   if (this.carbonDimension.getDataType() == DataTypes.STRING
   && value.length > 
CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT) {
-throw new CarbonDataLoadingException("Dataload failed, String size 
cannot exceed "
-+ CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT + " bytes");
+
logHolder.setReason(String.format(CarbonCommonConstants.STRING_LENGTH_EXCEEDED_MESSAGE,

Review comment:
   done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-16 Thread GitBox


nihal0107 commented on a change in pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#discussion_r471231678



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/datatypes/PrimitiveDataType.java
##
@@ -330,21 +331,35 @@ public void writeByteArray(Object input, DataOutputStream 
dataOutputStream,
 }
   }
 
+  private byte[] getNullForBytes(byte[] value) {
+String badRecordAction = CarbonProperties.getInstance()
+.getProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION);
+if 
(badRecordAction.equalsIgnoreCase(CarbonCommonConstants.FORCE_BAD_RECORD_ACTION))
 {
+  if (this.carbonDimension.getDataType() == DataTypes.STRING) {
+return CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY;
+  } else {
+return CarbonCommonConstants.EMPTY_BYTE_ARRAY;
+  }
+}
+return value;
+  }
+
   private void checkAndWriteByteArray(Object input, DataOutputStream 
dataOutputStream,
   BadRecordLogHolder logHolder, Boolean isWithoutConverter, String 
parsedValue, byte[] value)
   throws IOException {
 if (isWithoutConverter) {
   if (this.carbonDimension.getDataType() == DataTypes.STRING && input 
instanceof String
   && ((String)input).length() > 
CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT) {
-throw new CarbonDataLoadingException("Dataload failed, String size 
cannot exceed "
-+ CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT + " bytes");
+
logHolder.setReason(String.format(CarbonCommonConstants.STRING_LENGTH_EXCEEDED_MESSAGE,
+input.toString(), this.carbonDimension.getColName()));
+value = getNullForBytes(value);

Review comment:
   done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-16 Thread GitBox


nihal0107 commented on a change in pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#discussion_r471231616



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/datatypes/PrimitiveDataType.java
##
@@ -301,8 +301,9 @@ public void writeByteArray(Object input, DataOutputStream 
dataOutputStream,
   }
   if (this.carbonDimension.getDataType() == DataTypes.STRING
   && value.length > 
CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT) {
-throw new CarbonDataLoadingException("Dataload failed, String size 
cannot exceed "
-+ CarbonCommonConstants.MAX_CHARS_PER_COLUMN_DEFAULT + " 
bytes");
+
logHolder.setReason(String.format(CarbonCommonConstants.STRING_LENGTH_EXCEEDED_MESSAGE,

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-16 Thread GitBox


nihal0107 commented on a change in pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#discussion_r471231649



##
File path: 
processing/src/main/java/org/apache/carbondata/processing/datatypes/PrimitiveDataType.java
##
@@ -330,21 +331,35 @@ public void writeByteArray(Object input, DataOutputStream 
dataOutputStream,
 }
   }
 
+  private byte[] getNullForBytes(byte[] value) {

Review comment:
   removed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-16 Thread GitBox


nihal0107 commented on a change in pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#discussion_r471231426



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/longstring/VarcharDataTypesBasicTestCase.scala
##
@@ -194,11 +194,10 @@ class VarcharDataTypesBasicTestCase extends QueryTest 
with BeforeAndAfterEach wi
 // query should pass
 checkAnswer(sql("select * from testlongstring"),
   Seq(Row(1, "ab", "cool"), Row(1, "ab1", longChar), Row(1, "abc", 
longChar)))
-// insert long string should fail as unset is done

Review comment:
   As discussed this change is required because we are setting the bad 
record action as force when we are creating Instance of TestQueryExecutor.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-16 Thread GitBox


nihal0107 commented on a change in pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#discussion_r471230945



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
##
@@ -154,13 +153,44 @@ class TestLoadDataGeneral extends QueryTest with 
BeforeAndAfterEach {
 sql("CREATE TABLE load32000chardata(dim1 String, dim2 String, mes1 int) 
STORED AS carbondata")
 sql("CREATE TABLE load32000chardata_dup(dim1 String, dim2 String, mes1 
int) STORED AS carbondata")
 sql(s"LOAD DATA LOCAL INPATH '$testdata' into table load32000chardata 
OPTIONS('FILEHEADER'='dim1,dim2,mes1')")
-intercept[Exception] {
-  sql("insert into load32000chardata_dup select 
dim1,concat(load32000chardata.dim2,''),mes1 from load32000chardata").show()
-}
+checkAnswer(sql("select count(*) from load32000chardata"), Seq(Row(3)))
+// String whilch length greater than 32000 will be considered as bad 
record and will be inserted as null in table
+sql("insert into load32000chardata_dup select 
dim1,concat(load32000chardata.dim2,''),mes1 from load32000chardata").show()
+checkAnswer(sql("select count(*) from load32000chardata_dup"), Seq(Row(3)))
+checkAnswer(sql("select * from load32000chardata_dup where mes1=3"), 
Seq(Row("32000", null, 3)))
 sql(s"LOAD DATA LOCAL INPATH '$testdata' into table load32000chardata_dup 
OPTIONS('FILEHEADER'='dim1,dim2,mes1')")
+checkAnswer(sql("select count(*) from load32000chardata_dup"), Seq(Row(6)))
+// Update strings of length greater than 32000 will invalidate the whole 
row.
+sql("update load32000chardata_dup set(load32000chardata_dup.dim2)=(select 
concat(load32000chardata.dim2,'') " +
+  "from load32000chardata where load32000chardata.mes1=3) where 
load32000chardata_dup.mes1=3").show()
+checkAnswer(sql("select count(*) from load32000chardata_dup"), Seq(Row(6)))
+checkAnswer(sql("select * from load32000chardata_dup where mes1=3"), 
Seq(Row("32000", null, 3), Row("32000", null, 3)))
+
+val longChar: String = RandomStringUtils.randomAlphabetic(33000)
+// BAD_RECORD_ACTION = "REDIRECT"
+CarbonProperties.getInstance()
+.addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, 
"REDIRECT");

Review comment:
   Added the separate test case for all the bad record actions and checked 
the redirect value from file.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-16 Thread GitBox


nihal0107 commented on a change in pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#discussion_r471231039



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
##
@@ -154,13 +153,44 @@ class TestLoadDataGeneral extends QueryTest with 
BeforeAndAfterEach {
 sql("CREATE TABLE load32000chardata(dim1 String, dim2 String, mes1 int) 
STORED AS carbondata")
 sql("CREATE TABLE load32000chardata_dup(dim1 String, dim2 String, mes1 
int) STORED AS carbondata")
 sql(s"LOAD DATA LOCAL INPATH '$testdata' into table load32000chardata 
OPTIONS('FILEHEADER'='dim1,dim2,mes1')")
-intercept[Exception] {
-  sql("insert into load32000chardata_dup select 
dim1,concat(load32000chardata.dim2,''),mes1 from load32000chardata").show()
-}
+checkAnswer(sql("select count(*) from load32000chardata"), Seq(Row(3)))
+// String whilch length greater than 32000 will be considered as bad 
record and will be inserted as null in table
+sql("insert into load32000chardata_dup select 
dim1,concat(load32000chardata.dim2,''),mes1 from load32000chardata").show()
+checkAnswer(sql("select count(*) from load32000chardata_dup"), Seq(Row(3)))
+checkAnswer(sql("select * from load32000chardata_dup where mes1=3"), 
Seq(Row("32000", null, 3)))
 sql(s"LOAD DATA LOCAL INPATH '$testdata' into table load32000chardata_dup 
OPTIONS('FILEHEADER'='dim1,dim2,mes1')")
+checkAnswer(sql("select count(*) from load32000chardata_dup"), Seq(Row(6)))
+// Update strings of length greater than 32000 will invalidate the whole 
row.
+sql("update load32000chardata_dup set(load32000chardata_dup.dim2)=(select 
concat(load32000chardata.dim2,'') " +
+  "from load32000chardata where load32000chardata.mes1=3) where 
load32000chardata_dup.mes1=3").show()
+checkAnswer(sql("select count(*) from load32000chardata_dup"), Seq(Row(6)))
+checkAnswer(sql("select * from load32000chardata_dup where mes1=3"), 
Seq(Row("32000", null, 3), Row("32000", null, 3)))
+
+val longChar: String = RandomStringUtils.randomAlphabetic(33000)
+// BAD_RECORD_ACTION = "REDIRECT"
+CarbonProperties.getInstance()
+.addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, 
"REDIRECT");
+sql(s"insert into load32000chardata_dup values('32000', '$longChar', 3)")
+checkAnswer(sql("select count(*) from load32000chardata_dup"), Seq(Row(6)))
+checkAnswer(sql("select * from load32000chardata_dup where mes1=3"), 
Seq(Row("32000", null, 3), Row("32000", null, 3)))
+
+// BAD_RECORD_ACTION = "IGNORE"
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, "IGNORE");
+sql(s"insert into load32000chardata_dup values('32000', '$longChar', 3)")
+checkAnswer(sql("select count(*) from load32000chardata_dup"), Seq(Row(6)))
+checkAnswer(sql("select * from load32000chardata_dup where mes1=3"), 
Seq(Row("32000", null, 3), Row("32000", null, 3)))
+
+// BAD_RECORD_ACTION = "FAIL"
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, "FAIL");
 intercept[Exception] {
-  sql("update load32000chardata_dup 
set(load32000chardata_dup.dim2)=(select concat(load32000chardata.dim2,'') 
from load32000chardata)").show()
+  sql(s"insert into load32000chardata_dup values('32000', '$longChar', 3)")
 }
+checkAnswer(sql("select count(*) from load32000chardata_dup"), Seq(Row(6)))
+checkAnswer(sql("select * from load32000chardata_dup where mes1=3"), 
Seq(Row("32000", null, 3), Row("32000", null, 3)))
+CarbonProperties.getInstance()

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-16 Thread GitBox


nihal0107 commented on a change in pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#discussion_r471230570



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
##
@@ -154,13 +153,44 @@ class TestLoadDataGeneral extends QueryTest with 
BeforeAndAfterEach {
 sql("CREATE TABLE load32000chardata(dim1 String, dim2 String, mes1 int) 
STORED AS carbondata")
 sql("CREATE TABLE load32000chardata_dup(dim1 String, dim2 String, mes1 
int) STORED AS carbondata")
 sql(s"LOAD DATA LOCAL INPATH '$testdata' into table load32000chardata 
OPTIONS('FILEHEADER'='dim1,dim2,mes1')")
-intercept[Exception] {
-  sql("insert into load32000chardata_dup select 
dim1,concat(load32000chardata.dim2,''),mes1 from load32000chardata").show()

Review comment:
   As discussed we are setting the bad record action as force when we are 
creating the Instance of TestQueryExecutor. It is not done by other test cases.

##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
##
@@ -154,13 +153,44 @@ class TestLoadDataGeneral extends QueryTest with 
BeforeAndAfterEach {
 sql("CREATE TABLE load32000chardata(dim1 String, dim2 String, mes1 int) 
STORED AS carbondata")
 sql("CREATE TABLE load32000chardata_dup(dim1 String, dim2 String, mes1 
int) STORED AS carbondata")
 sql(s"LOAD DATA LOCAL INPATH '$testdata' into table load32000chardata 
OPTIONS('FILEHEADER'='dim1,dim2,mes1')")
-intercept[Exception] {
-  sql("insert into load32000chardata_dup select 
dim1,concat(load32000chardata.dim2,''),mes1 from load32000chardata").show()
-}
+checkAnswer(sql("select count(*) from load32000chardata"), Seq(Row(3)))
+// String whilch length greater than 32000 will be considered as bad 
record and will be inserted as null in table

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-16 Thread GitBox


nihal0107 commented on a change in pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#discussion_r471229741



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
##
@@ -20,17 +20,16 @@ package 
org.apache.carbondata.integration.spark.testsuite.dataload
 import java.math.BigDecimal
 
 import scala.collection.mutable.ArrayBuffer
-

Review comment:
   done

##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
##
@@ -20,17 +20,16 @@ package 
org.apache.carbondata.integration.spark.testsuite.dataload
 import java.math.BigDecimal
 
 import scala.collection.mutable.ArrayBuffer
-
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.test.util.QueryTest
 import org.scalatest.BeforeAndAfterEach
-

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] nihal0107 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-16 Thread GitBox


nihal0107 commented on a change in pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#discussion_r471229623



##
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##
@@ -2456,4 +2456,10 @@ private CarbonCommonConstants() {
* property which defines the insert stage flow
*/
   public static final String IS_INSERT_STAGE = "is_insert_stage";
+
+  public static final String STRING_LENGTH_EXCEEDED_MESSAGE =
+  "Record %s of column %s exceeded " + MAX_CHARS_PER_COLUMN_DEFAULT +
+  " bytes. Please consider long string data type.";

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3892: flink write carbon file to hdfs when file size is less than 1M,can't write

2020-08-16 Thread GitBox


ajantha-bhat commented on pull request #3892:
URL: https://github.com/apache/carbondata/pull/3892#issuecomment-674649559







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

2020-08-16 Thread GitBox


QiangCai commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r469772276



##
File path: 
core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
##
@@ -39,7 +39,7 @@ public ArrayQueryType(String name, String parentName, int 
columnIndex) {
 
   @Override
   public void addChildren(GenericQueryType children) {
-if (this.getName().equals(children.getParentName())) {
+if (null == this.getName() || 
this.getName().equals(children.getParentName())) {

Review comment:
   When the name can be null?

##
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##
@@ -2456,4 +2456,15 @@ private CarbonCommonConstants() {
* property which defines the insert stage flow
*/
   public static final String IS_INSERT_STAGE = "is_insert_stage";
+
+  /**
+   * Until the threshold for complex filter is reached, row id will be set to 
the bitset in
+   * implicit filter during secondary index pruning
+   */
+  public static final String SI_COMPLEX_FILTER_THRESHOLD = 
"carbon.si.complex.filter.threshold";

Review comment:
   better to move all constants of SI together to one place of this class.

##
File path: 
core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java
##
@@ -41,39 +44,62 @@
* map that contains the mapping of block id to the valid blocklets in that 
block which contain
* the data as per the applied filter
*/
-  private Map> blockIdToBlockletIdMapping;
+  private final Map> blockIdToBlockletIdMapping;
+
+  /**
+   * checks if implicit filter exceeds complex filter threshold
+   */
+  private boolean isThresholdReached;
 
   public ImplicitExpression(List implicitFilterList) {
+final Logger LOGGER = 
LogServiceFactory.getLogService(getClass().getName());

Review comment:
   move LOGGER to be a static field of this class

##
File path: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java
##
@@ -221,4 +221,9 @@ public int numberOfNodes() {
   public List getBlockInfos() {

Review comment:
   after we added getTableBlockInfo(), can we remove this method?

##
File path: 
core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java
##
@@ -41,39 +44,62 @@
* map that contains the mapping of block id to the valid blocklets in that 
block which contain
* the data as per the applied filter
*/
-  private Map> blockIdToBlockletIdMapping;
+  private final Map> blockIdToBlockletIdMapping;
+
+  /**
+   * checks if implicit filter exceeds complex filter threshold
+   */
+  private boolean isThresholdReached;
 
   public ImplicitExpression(List implicitFilterList) {
+final Logger LOGGER = 
LogServiceFactory.getLogService(getClass().getName());
 // initialize map with half the size of filter list as one block id can 
contain
 // multiple blocklets
 blockIdToBlockletIdMapping = new HashMap<>(implicitFilterList.size() / 2);
 for (Expression value : implicitFilterList) {
   String blockletPath = ((LiteralExpression) 
value).getLiteralExpValue().toString();
   addBlockEntry(blockletPath);
 }
+int complexFilterThreshold = 
CarbonProperties.getInstance().getComplexFilterThresholdForSI();
+isThresholdReached = implicitFilterList.size() > complexFilterThreshold;
+if (isThresholdReached) {
+  LOGGER.info("Implicit Filter Size: " + implicitFilterList.size() + ", 
Threshold is: "
+  + complexFilterThreshold);
+}
   }
 
-  public ImplicitExpression(Map> 
blockIdToBlockletIdMapping) {
+  public ImplicitExpression(Map> 
blockIdToBlockletIdMapping) {
 this.blockIdToBlockletIdMapping = blockIdToBlockletIdMapping;
   }
 
   private void addBlockEntry(String blockletPath) {

Review comment:
   The logic of this method is hard to understand.
   Can we add a flag into ImplicitExpression when it is created?
   if it is blocklet level, we addBlockletEntry.
   if it is row level, we addRowEntry.
   
   

##
File path: 
core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
##
@@ -97,21 +97,31 @@ public void fillRequiredBlockData(RawBlockletColumnChunks 
blockChunkHolder)
 
   @Override
   public Object getDataBasedOnDataType(ByteBuffer dataBuffer) {
-Object[] data = fillData(dataBuffer);
+return getDataBasedOnDataType(dataBuffer, false);
+  }
+
+  @Override
+  public Object getDataBasedOnDataType(ByteBuffer dataBuffer, boolean 
getBytesData) {

Review comment:
   how about to keep the old method and add a new method 
getObjectDataBasedOnDataType?
   It will not need this boolean parameter.





This is an automated message from the Apache Git Service.
To respond to the message, please log on 

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-16 Thread GitBox


CarbonDataQA1 commented on pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#issuecomment-674577192


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2000/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-16 Thread GitBox


CarbonDataQA1 commented on pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#issuecomment-674576877


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3739/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-16 Thread GitBox


CarbonDataQA1 commented on pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#issuecomment-674566583


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1999/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-16 Thread GitBox


CarbonDataQA1 commented on pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#issuecomment-674565481


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3738/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-16 Thread GitBox


CarbonDataQA1 commented on pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#issuecomment-674555949


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3737/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-16 Thread GitBox


CarbonDataQA1 commented on pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#issuecomment-674555827


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1998/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-16 Thread GitBox


CarbonDataQA1 commented on pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#issuecomment-674553276


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3736/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-16 Thread GitBox


CarbonDataQA1 commented on pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#issuecomment-674553169


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1997/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-16 Thread GitBox


CarbonDataQA1 commented on pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#issuecomment-674552745


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3735/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-16 Thread GitBox


CarbonDataQA1 commented on pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#issuecomment-674552642


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1996/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org