[GitHub] [carbondata] vikramahuja1001 opened a new pull request #3894: [WIP] Added property to enable disable SIforFailed segments and added prope…

2020-08-17 Thread GitBox


vikramahuja1001 opened a new pull request #3894:
URL: https://github.com/apache/carbondata/pull/3894


   …rty to limit number of segments
   
### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

2020-08-17 Thread GitBox


CarbonDataQA1 commented on pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#issuecomment-675280659


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2011/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

2020-08-17 Thread GitBox


CarbonDataQA1 commented on pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#issuecomment-675279974


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3752/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3837: [CARBONDATA-3927]Remove unwanted fields from tupleID to make it short and to improve store size and performance.

2020-08-17 Thread GitBox


CarbonDataQA1 commented on pull request #3837:
URL: https://github.com/apache/carbondata/pull/3837#issuecomment-675279405


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3751/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3837: [CARBONDATA-3927]Remove unwanted fields from tupleID to make it short and to improve store size and performance.

2020-08-17 Thread GitBox


CarbonDataQA1 commented on pull request #3837:
URL: https://github.com/apache/carbondata/pull/3837#issuecomment-675278720


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2010/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance

2020-08-17 Thread GitBox


ajantha-bhat commented on pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#issuecomment-675277686


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3879: [CARBONDATA-3943] Handling the addition of geo column to hive at the time of table creation.

2020-08-17 Thread GitBox


CarbonDataQA1 commented on pull request #3879:
URL: https://github.com/apache/carbondata/pull/3879#issuecomment-675276272


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3755/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance

2020-08-17 Thread GitBox


ajantha-bhat commented on pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#issuecomment-675272382


   @akashrn5 : 2.4.5 build has a random failure, observed in other PR's also. 
you can merge this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance

2020-08-17 Thread GitBox


CarbonDataQA1 commented on pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#issuecomment-675270450


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3750/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance

2020-08-17 Thread GitBox


CarbonDataQA1 commented on pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#issuecomment-675268681


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2009/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-17 Thread GitBox


akashrn5 commented on a change in pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#discussion_r471918176



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
##
@@ -145,47 +150,162 @@ class TestLoadDataGeneral extends QueryTest with 
BeforeAndAfterEach {
 sql("drop table if exists carbon_table")
   }
 
-  test("test insert / update with data more than 32000 characters") {
+  test("test load / insert / update with data more than 32000 characters and 
bad record action as Redirect") {
+val testdata =s"$resourcesPath/MoreThan32KChar.csv"
+FileFactory.deleteAllFilesOfDir(new File(CarbonProperties.getInstance()
+  .getProperty(CarbonCommonConstants.CARBON_BADRECORDS_LOC)))
+sql("CREATE TABLE longerthan32kchar(dim1 String, dim2 String, mes1 int) 
STORED AS carbondata")
+sql(s"LOAD DATA LOCAL INPATH '$testdata' into table longerThan32kChar 
OPTIONS('FILEHEADER'='dim1,dim2,mes1', " +
+  s"'BAD_RECORDS_ACTION'='REDIRECT','BAD_RECORDS_LOGGER_ENABLE'='TRUE')")
+var redirectCsvPath = getRedirectCsvPath("default", "longerthan32kchar", 
"0", "0")
+assert(checkRedirectedCsvContentAvailableInSource(testdata, 
redirectCsvPath))
+val longChar: String = RandomStringUtils.randomAlphabetic(33000)
+
 CarbonProperties.getInstance()
   
.addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT,
 "true")
-val testdata =s"$resourcesPath/32000char.csv"
-sql("drop table if exists load32000chardata")
-sql("drop table if exists load32000chardata_dup")
-sql("CREATE TABLE load32000chardata(dim1 String, dim2 String, mes1 int) 
STORED AS carbondata")
-sql("CREATE TABLE load32000chardata_dup(dim1 String, dim2 String, mes1 
int) STORED AS carbondata")
-sql(s"LOAD DATA LOCAL INPATH '$testdata' into table load32000chardata 
OPTIONS('FILEHEADER'='dim1,dim2,mes1')")
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION, 
"REDIRECT");
+sql(s"insert into longerthan32kchar values('33000', '$longChar', 4)")
+checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("ok", "hi", 
1), Row("itsok", "hello", 2)))
+redirectCsvPath = getRedirectCsvPath("default", "longerthan32kchar", "1", 
"0")
+var redirectedFileLineList = FileUtils.readLines(redirectCsvPath)
+var iterator = redirectedFileLineList.iterator()
+while (iterator.hasNext) {
+  assert(iterator.next().equals("33000,"+longChar+",4"))
+}
+
+// Update strings of length greater than 32000
+sql(s"update longerthan32kchar set(longerthan32kchar.dim2)=('$longChar') " 
+
+  "where longerthan32kchar.mes1=1").show()
+checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("itsok", 
"hello", 2)))
+redirectCsvPath = getRedirectCsvPath("default", "longerthan32kchar", "0", 
"1")
+redirectedFileLineList = FileUtils.readLines(redirectCsvPath)
+iterator = redirectedFileLineList.iterator()
+while (iterator.hasNext) {
+  assert(iterator.next().equals("ok,"+longChar+",1"))
+}
+CarbonProperties.getInstance()
+  
.addProperty(CarbonCommonConstants.CARBON_ENABLE_BAD_RECORD_HANDLING_FOR_INSERT,
 "false")
+
+// Insert longer string without converter step will throw exception
 intercept[Exception] {
-  sql("insert into load32000chardata_dup select 
dim1,concat(load32000chardata.dim2,''),mes1 from load32000chardata").show()
+  sql(s"insert into longerthan32kchar values('32000', '$longChar', 3)")
 }
-sql(s"LOAD DATA LOCAL INPATH '$testdata' into table load32000chardata_dup 
OPTIONS('FILEHEADER'='dim1,dim2,mes1')")
+
+FileFactory.deleteAllFilesOfDir(new File(CarbonProperties.getInstance()
+  .getProperty(CarbonCommonConstants.CARBON_BADRECORDS_LOC)))
+  }
+
+  test("test load / insert / update with data more than 32000 characters and 
bad record action as Force") {
+val testdata =s"$resourcesPath/MoreThan32KChar.csv"
+sql("CREATE TABLE longerthan32kchar(dim1 String, dim2 String, mes1 int) 
STORED AS carbondata")
+sql(s"LOAD DATA LOCAL INPATH '$testdata' into table longerThan32kChar 
OPTIONS('FILEHEADER'='dim1,dim2,mes1', " +
+  s"'BAD_RECORDS_ACTION'='FORCE','BAD_RECORDS_LOGGER_ENABLE'='TRUE')")
+checkAnswer(sql("select * from longerthan32kchar"), Seq(Row("ok", "hi", 
1), Row("itsok", "hello", 2), Row("32123", null, 3)))

Review comment:
   move `testdata`, create and load command to a method and pass the bad 
record action as parameter, as its a common code between the test cases, code 
will be clean.

##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestLoadDataGeneral.scala
##
@@ -145,47 +150,162 @@ class TestLoadDataGeneral extends QueryTest with 
BeforeAndAfterEach {
 sql("drop table if exists carbon_table")
   }
 
-  test("test insert /

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine

2020-08-17 Thread GitBox


Indhumathi27 commented on a change in pull request #3885:
URL: https://github.com/apache/carbondata/pull/3885#discussion_r471912427



##
File path: 
integration/presto/src/main/prestodb/org/apache/carbondata/presto/impl/CarbonTableReader.java
##
@@ -281,7 +287,11 @@ private CarbonTableCacheModel 
getValidCacheBySchemaTableName(SchemaTableName sch
   createInputFormat(jobConf, carbonTable.getAbsoluteTableIdentifier(),
   new IndexFilter(carbonTable, filters, true), filteredPartitions);
   Job job = Job.getInstance(jobConf);
+  CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.IS_QUERY_FROM_PRESTO, "true");

Review comment:
   i think we can add only in connector classes. Moved to carbonDataModule. 
Please check if it is ok @ajantha-bhat @kunal642 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3879: [CARBONDATA-3943] Handling the addition of geo column to hive at the time of table creation.

2020-08-17 Thread GitBox


akashrn5 commented on a change in pull request #3879:
URL: https://github.com/apache/carbondata/pull/3879#discussion_r471903933



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/CarbonSource.scala
##
@@ -281,10 +281,22 @@ object CarbonSource {
   isExternal)
 val updatedFormat = CarbonToSparkAdapter
   .getUpdatedStorageFormat(storageFormat, updatedTableProperties, 
tableInfo.getTablePath)

Review comment:
   if its handled, may be in `GeoTableExampleWithCarbonSession.scala` 
please add some validations just to check if the geo hash column is added in 
schema for the hive table.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #3855: [CARBONDATA-3863], after using index service clean the temp data

2020-08-17 Thread GitBox


kunal642 commented on pull request #3855:
URL: https://github.com/apache/carbondata/pull/3855#issuecomment-675237567


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3879: [CARBONDATA-3943] Handling the addition of geo column to hive at the time of table creation.

2020-08-17 Thread GitBox


akashrn5 commented on a change in pull request #3879:
URL: https://github.com/apache/carbondata/pull/3879#discussion_r471902010



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/CarbonSource.scala
##
@@ -281,10 +281,22 @@ object CarbonSource {
   isExternal)
 val updatedFormat = CarbonToSparkAdapter
   .getUpdatedStorageFormat(storageFormat, updatedTableProperties, 
tableInfo.getTablePath)

Review comment:
   here i can see the handking for `createCatalogTableForCarbonExtension` 
only, is it handled for `createCatalogTableForCarbonSession`?

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/CarbonSource.scala
##
@@ -281,10 +281,22 @@ object CarbonSource {
   isExternal)
 val updatedFormat = CarbonToSparkAdapter
   .getUpdatedStorageFormat(storageFormat, updatedTableProperties, 
tableInfo.getTablePath)

Review comment:
   here i can see the handling for `createCatalogTableForCarbonExtension` 
only, is it handled for `createCatalogTableForCarbonSession`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine

2020-08-17 Thread GitBox


kunal642 commented on a change in pull request #3885:
URL: https://github.com/apache/carbondata/pull/3885#discussion_r471901716



##
File path: 
integration/presto/src/main/prestodb/org/apache/carbondata/presto/impl/CarbonTableReader.java
##
@@ -281,7 +287,11 @@ private CarbonTableCacheModel 
getValidCacheBySchemaTableName(SchemaTableName sch
   createInputFormat(jobConf, carbonTable.getAbsoluteTableIdentifier(),
   new IndexFilter(carbonTable, filters, true), filteredPartitions);
   Job job = Job.getInstance(jobConf);
+  CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.IS_QUERY_FROM_PRESTO, "true");

Review comment:
   @ajantha-bhat If we have a common place, better to move there





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #3837: [CARBONDATA-3927]Remove unwanted fields from tupleID to make it short and to improve store size and performance.

2020-08-17 Thread GitBox


kunal642 commented on pull request #3837:
URL: https://github.com/apache/carbondata/pull/3837#issuecomment-675236391


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #3858: [CARBONDATA-3919] Improve concurrent query performance

2020-08-17 Thread GitBox


akashrn5 commented on pull request #3858:
URL: https://github.com/apache/carbondata/pull/3858#issuecomment-675231491


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine

2020-08-17 Thread GitBox


CarbonDataQA1 commented on pull request #3885:
URL: https://github.com/apache/carbondata/pull/3885#issuecomment-675020057


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2008/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine

2020-08-17 Thread GitBox


CarbonDataQA1 commented on pull request #3885:
URL: https://github.com/apache/carbondata/pull/3885#issuecomment-67500


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3749/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #3773: [CARBONDATA-3830]Presto array columns read support

2020-08-17 Thread GitBox


ajantha-bhat edited a comment on pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#issuecomment-674940989


   @akkio-97 : Thanks for working on this. 
   
   Based on the above comments (array design is not vectorized, fill vector 
need to be dissolved and keep it as original, interface need to have a default 
implementation, all the data types not handled, delta flows were not handled, 
null values not handled), **it will be very hard to stabilize this based on the 
current design**.
   **I have analyzed and reworked on the new design in #3887,** **I will add 
you as co-Author to the same.** 
   **you can close this PR** and later raise PR for local dict support, multi 
level array & struct support and map support. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3893: Added new property to set the value of executor LRU cache size to 70% of the total executor memory in IndexServer, if executor LRU

2020-08-17 Thread GitBox


CarbonDataQA1 commented on pull request #3893:
URL: https://github.com/apache/carbondata/pull/3893#issuecomment-674955113


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2007/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3893: Added new property to set the value of executor LRU cache size to 70% of the total executor memory in IndexServer, if executor LRU

2020-08-17 Thread GitBox


CarbonDataQA1 commented on pull request #3893:
URL: https://github.com/apache/carbondata/pull/3893#issuecomment-674954792


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3748/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on pull request #3885: [CARBONDATA-3946] Support IndexServer with Presto Engine

2020-08-17 Thread GitBox


Indhumathi27 commented on pull request #3885:
URL: https://github.com/apache/carbondata/pull/3885#issuecomment-674941930


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3773: [CARBONDATA-3830]Presto array columns read support

2020-08-17 Thread GitBox


ajantha-bhat commented on pull request #3773:
URL: https://github.com/apache/carbondata/pull/3773#issuecomment-674940989


   @akkio-97 : Thanks for working on this. 
   
   Based on the above comments (array design is not vectorized, fill vector 
need to be dissolved and keep it as original, interface need to have a default 
implementation, all the data types not handled, delta flows were not handled) 
**it will be very hard to stabilize this based on the current design**.
   **I have analyzed and reworked on the new design in #3887,** **I will add 
you as co-Author to the same.** 
   **you can close this PR** and later raise PR for local dict support, multi 
level array & struct support and map support. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [WIP] Refactor #3773 and support struct type

2020-08-17 Thread GitBox


CarbonDataQA1 commented on pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#issuecomment-674907332


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2006/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [WIP] Refactor #3773 and support struct type

2020-08-17 Thread GitBox


CarbonDataQA1 commented on pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#issuecomment-674898812


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3747/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Karan980 opened a new pull request #3893: Added new property to set the value of executor LRU cache size to 70% of the total executor memory in IndexServer, if executor LRU cache

2020-08-17 Thread GitBox


Karan980 opened a new pull request #3893:
URL: https://github.com/apache/carbondata/pull/3893


### Why is this PR needed?
   This PR will set  executor LRU cache memory to 70% of executor memory size, 
if it is not configured.


### What changes were proposed in this PR?
   Added new property to set executor LRU cache size to 70%
   

### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

2020-08-17 Thread GitBox


Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r471459506



##
File path: 
core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
##
@@ -97,21 +97,31 @@ public void fillRequiredBlockData(RawBlockletColumnChunks 
blockChunkHolder)
 
   @Override
   public Object getDataBasedOnDataType(ByteBuffer dataBuffer) {
-Object[] data = fillData(dataBuffer);
+return getDataBasedOnDataType(dataBuffer, false);
+  }
+
+  @Override
+  public Object getDataBasedOnDataType(ByteBuffer dataBuffer, boolean 
getBytesData) {

Review comment:
   handled





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

2020-08-17 Thread GitBox


CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-674861462


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3746/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

2020-08-17 Thread GitBox


CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-674854838


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2005/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3778: [CARBONDATA-3916] Support array complex type with SI

2020-08-17 Thread GitBox


CarbonDataQA1 commented on pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#issuecomment-674787907


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2004/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

2020-08-17 Thread GitBox


Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r471366000



##
File path: 
core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java
##
@@ -41,39 +44,62 @@
* map that contains the mapping of block id to the valid blocklets in that 
block which contain
* the data as per the applied filter
*/
-  private Map> blockIdToBlockletIdMapping;
+  private final Map> blockIdToBlockletIdMapping;
+
+  /**
+   * checks if implicit filter exceeds complex filter threshold
+   */
+  private boolean isThresholdReached;
 
   public ImplicitExpression(List implicitFilterList) {
+final Logger LOGGER = 
LogServiceFactory.getLogService(getClass().getName());
 // initialize map with half the size of filter list as one block id can 
contain
 // multiple blocklets
 blockIdToBlockletIdMapping = new HashMap<>(implicitFilterList.size() / 2);
 for (Expression value : implicitFilterList) {
   String blockletPath = ((LiteralExpression) 
value).getLiteralExpValue().toString();
   addBlockEntry(blockletPath);
 }
+int complexFilterThreshold = 
CarbonProperties.getInstance().getComplexFilterThresholdForSI();
+isThresholdReached = implicitFilterList.size() > complexFilterThreshold;
+if (isThresholdReached) {
+  LOGGER.info("Implicit Filter Size: " + implicitFilterList.size() + ", 
Threshold is: "
+  + complexFilterThreshold);
+}
   }
 
-  public ImplicitExpression(Map> 
blockIdToBlockletIdMapping) {
+  public ImplicitExpression(Map> 
blockIdToBlockletIdMapping) {
 this.blockIdToBlockletIdMapping = blockIdToBlockletIdMapping;
   }
 
   private void addBlockEntry(String blockletPath) {

Review comment:
   handled





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

2020-08-17 Thread GitBox


Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r471365947



##
File path: 
core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java
##
@@ -41,39 +44,62 @@
* map that contains the mapping of block id to the valid blocklets in that 
block which contain
* the data as per the applied filter
*/
-  private Map> blockIdToBlockletIdMapping;
+  private final Map> blockIdToBlockletIdMapping;
+
+  /**
+   * checks if implicit filter exceeds complex filter threshold
+   */
+  private boolean isThresholdReached;
 
   public ImplicitExpression(List implicitFilterList) {
+final Logger LOGGER = 
LogServiceFactory.getLogService(getClass().getName());

Review comment:
   moved





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

2020-08-17 Thread GitBox


Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r471365673



##
File path: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java
##
@@ -221,4 +221,9 @@ public int numberOfNodes() {
   public List getBlockInfos() {

Review comment:
   removed getBlockInfos method





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3778: [CARBONDATA-3916] Support array complex type with SI

2020-08-17 Thread GitBox


Indhumathi27 commented on a change in pull request #3778:
URL: https://github.com/apache/carbondata/pull/3778#discussion_r471365424



##
File path: 
core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
##
@@ -39,7 +39,7 @@ public ArrayQueryType(String name, String parentName, int 
columnIndex) {
 
   @Override
   public void addChildren(GenericQueryType children) {
-if (this.getName().equals(children.getParentName())) {
+if (null == this.getName() || 
this.getName().equals(children.getParentName())) {

Review comment:
   removed this check

##
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##
@@ -2456,4 +2456,15 @@ private CarbonCommonConstants() {
* property which defines the insert stage flow
*/
   public static final String IS_INSERT_STAGE = "is_insert_stage";
+
+  /**
+   * Until the threshold for complex filter is reached, row id will be set to 
the bitset in
+   * implicit filter during secondary index pruning
+   */
+  public static final String SI_COMPLEX_FILTER_THRESHOLD = 
"carbon.si.complex.filter.threshold";

Review comment:
   handled





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [WIP] Refactor #3773 and support struct type

2020-08-17 Thread GitBox


CarbonDataQA1 commented on pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#issuecomment-674764749


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3743/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3887: [WIP] Refactor #3773 and support struct type

2020-08-17 Thread GitBox


CarbonDataQA1 commented on pull request #3887:
URL: https://github.com/apache/carbondata/pull/3887#issuecomment-674756309


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2003/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3892: flink write carbon file to hdfs when file size is less than 1M,can't write

2020-08-17 Thread GitBox


ajantha-bhat commented on a change in pull request #3892:
URL: https://github.com/apache/carbondata/pull/3892#discussion_r471286842



##
File path: 
integration/flink/src/main/java/org/apache/carbon/core/metadata/StageManager.java
##
@@ -81,7 +81,7 @@ public static void writeStageInput(final String 
stageInputPath, final StageInput
   private static void writeSuccessFile(final String successFilePath) throws 
IOException {
 final DataOutputStream segmentStatusSuccessOutputStream =
 FileFactory.getDataOutputStream(successFilePath,
-CarbonCommonConstants.BYTEBUFFER_SIZE, 1024);
+CarbonCommonConstants.BYTEBUFFER_SIZE, 1024 * 1024 * 2);

Review comment:
   what if the file size is greater than 2 MB  ? why 2MB selected for this ?
   may be need to pass the actual file size ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-17 Thread GitBox


CarbonDataQA1 commented on pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#issuecomment-674707292


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3742/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3865: [CARBONDATA-3928] Handled the Strings which length is greater than 32000 as a bad record.

2020-08-17 Thread GitBox


CarbonDataQA1 commented on pull request #3865:
URL: https://github.com/apache/carbondata/pull/3865#issuecomment-674707040


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2002/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3892: flink write carbon file to hdfs when file size is less than 1M,can't write

2020-08-17 Thread GitBox


ajantha-bhat commented on pull request #3892:
URL: https://github.com/apache/carbondata/pull/3892#issuecomment-674706635


   @yutaoChina : Thanks for working on this.
   a) please handle the compilation error
   b) please create a jira issue and add it in the issue header 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3892: flink write carbon file to hdfs when file size is less than 1M,can't write

2020-08-17 Thread GitBox


CarbonDataQA1 commented on pull request #3892:
URL: https://github.com/apache/carbondata/pull/3892#issuecomment-674700072


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2001/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org