[GitHub] [carbondata] akashrn5 commented on pull request #3911: [CARBONDATA-3793]Fix update and delete issue when multiple partition columns are present and clean files issue

2020-09-10 Thread GitBox


akashrn5 commented on pull request #3911:
URL: https://github.com/apache/carbondata/pull/3911#issuecomment-690873243


   @kunal642 build passed, please help to review.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3982) Use Partition instead of Span to split legacy and non-legacy segments for executor distribution in indexserver

2020-09-10 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3982:


 Summary: Use Partition instead of Span to split legacy and 
non-legacy segments for executor distribution in indexserver 
 Key: CARBONDATA-3982
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3982
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] Karan-c980 commented on pull request #3876: TestingCI

2020-09-10 Thread GitBox


Karan-c980 commented on pull request #3876:
URL: https://github.com/apache/carbondata/pull/3876#issuecomment-690870873


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3920: [CARBONDATA-3981] Presto filter check on binary datatype

2020-09-10 Thread GitBox


ajantha-bhat commented on a change in pull request #3920:
URL: https://github.com/apache/carbondata/pull/3920#discussion_r486760013



##
File path: 
integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/PrestoTestNonTransactionalTableFiles.scala
##
@@ -230,6 +230,37 @@ class PrestoTestNonTransactionalTableFiles extends 
FunSuiteLike with BeforeAndAf
 }
   }
 
+  def buildOnlyBinary(rows: Int, sortColumns: Array[String], path : String): 
Any = {

Review comment:
   existing binary testcase only can you add filter query ? I guess no need 
to add new testcase for it, it will slow down CI running time.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3920: [CARBONDATA-3981] Presto filter check on binary datatype

2020-09-10 Thread GitBox


ajantha-bhat commented on a change in pull request #3920:
URL: https://github.com/apache/carbondata/pull/3920#discussion_r486758917



##
File path: 
integration/presto/src/main/prestosql/org/apache/carbondata/presto/PrestoFilterUtil.java
##
@@ -78,6 +78,8 @@ private static DataType 
spi2CarbondataTypeMapper(HiveColumnHandle columnHandle)
 HiveType colType = columnHandle.getHiveType();
 if (colType.equals(HiveType.HIVE_BOOLEAN)) {
   return DataTypes.BOOLEAN;
+} else if (colType.equals(HiveType.HIVE_BINARY)) {

Review comment:
   I can see byte and float data type is also missing. can you add and test 
for it ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3920: [CARBONDATA-3981] Presto filter check on binary datatype

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3920:
URL: https://github.com/apache/carbondata/pull/3920#issuecomment-690714479


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2305/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3920: [CARBONDATA-3981] Presto filter check on binary datatype

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3920:
URL: https://github.com/apache/carbondata/pull/3920#issuecomment-690713492


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4043/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3920: [CARBONDATA-3981] Presto filter check on binary datatype

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3920:
URL: https://github.com/apache/carbondata/pull/3920#issuecomment-690598521


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2304/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3920: [CARBONDATA-3981] Presto filter check on binary datatype

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3920:
URL: https://github.com/apache/carbondata/pull/3920#issuecomment-690590126


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4042/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3861:
URL: https://github.com/apache/carbondata/pull/3861#issuecomment-690552025


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2303/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3861:
URL: https://github.com/apache/carbondata/pull/3861#issuecomment-690537698


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4041/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3911: [CARBONDATA-3793]Fix update and delete issue when multiple partition columns are present and clean files issue

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3911:
URL: https://github.com/apache/carbondata/pull/3911#issuecomment-690502936


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2301/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3911: [CARBONDATA-3793]Fix update and delete issue when multiple partition columns are present and clean files issue

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3911:
URL: https://github.com/apache/carbondata/pull/3911#issuecomment-690500906


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4040/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (CARBONDATA-3981) Presto filter check on binary datatype

2020-09-10 Thread Akshay (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshay updated CARBONDATA-3981:
---
Description: 
Due to the absence of binary datatype check, there was a problem during object 
serialisation in presto filter queries.

"select * from table where bin = cast('abc' as varbinary)" threw - error during 
serialisation.

So have added required check in prestoFIlterUtil.java

  was:
Due to the absence of binary datatype check, there was a problem during object 
serialisation in presto filter queries.

"select * from table where bin = cast('abc' as varbinary)" threw - error during 
serialisation.


> Presto filter check on binary datatype
> --
>
> Key: CARBONDATA-3981
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3981
> Project: CarbonData
>  Issue Type: Bug
>  Components: presto-integration
>Reporter: Akshay
>Priority: Major
>
> Due to the absence of binary datatype check, there was a problem during 
> object serialisation in presto filter queries.
> "select * from table where bin = cast('abc' as varbinary)" threw - error 
> during serialisation.
> So have added required check in prestoFIlterUtil.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] akkio-97 opened a new pull request #3920: [CARBONDATA-3981] Presto filter check on binary datatype

2020-09-10 Thread GitBox


akkio-97 opened a new pull request #3920:
URL: https://github.com/apache/carbondata/pull/3920


### Why is this PR needed?
Due to the absence of binary datatype check, there was a problem during 
object serialisation in presto filter queries.

### What changes were proposed in this PR?
   Binary datatype check has been added in prestoFIlterUtil.java
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3981) Presto filter check on binary datatype

2020-09-10 Thread Akshay (Jira)
Akshay created CARBONDATA-3981:
--

 Summary: Presto filter check on binary datatype
 Key: CARBONDATA-3981
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3981
 Project: CarbonData
  Issue Type: Bug
  Components: presto-integration
Reporter: Akshay


Due to the absence of binary datatype check, there was a problem during object 
serialisation in presto filter queries.

"select * from table where bin = cast('abc' as varbinary)" threw - error during 
serialisation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] ajantha-bhat commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries

2020-09-10 Thread GitBox


ajantha-bhat commented on pull request #3861:
URL: https://github.com/apache/carbondata/pull/3861#issuecomment-690370071


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries

2020-09-10 Thread GitBox


kunal642 commented on pull request #3861:
URL: https://github.com/apache/carbondata/pull/3861#issuecomment-690369177


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #2729: [WIP] Carbon Store Size Optimization and Query Performance Improvement

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #2729:
URL: https://github.com/apache/carbondata/pull/2729#issuecomment-690352650


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2302/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] asfgit closed pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-09-10 Thread GitBox


asfgit closed pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-3961) Reorder filter according to the column storage ordinal to improve reading

2020-09-10 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-3961.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> Reorder filter according to the column storage ordinal to improve reading
> -
>
> Key: CARBONDATA-3961
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3961
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Kunal Kapoor
>Assignee: Kunal Kapoor
>Priority: Major
> Fix For: 2.1.0
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CARBONDATA-3969) Fix Deserialization issue with DataType class

2020-09-10 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat reassigned CARBONDATA-3969:


Assignee: Indhumathi Muthumurugesh

> Fix Deserialization issue with DataType class
> -
>
> Key: CARBONDATA-3969
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3969
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Assignee: Indhumathi Muthumurugesh
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3969) Fix Deserialization issue with DataType class

2020-09-10 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-3969.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> Fix Deserialization issue with DataType class
> -
>
> Key: CARBONDATA-3969
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3969
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Assignee: Indhumathi Muthumurugesh
>Priority: Major
> Fix For: 2.1.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] asfgit closed pull request #3910: [CARBONDATA-3969] Fix Deserialization issue with DataType class

2020-09-10 Thread GitBox


asfgit closed pull request #3910:
URL: https://github.com/apache/carbondata/pull/3910


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #3911: [CARBONDATA-3793]Fix update and delete issue when multiple partition columns are present and clean files issue

2020-09-10 Thread GitBox


akashrn5 commented on pull request #3911:
URL: https://github.com/apache/carbondata/pull/3911#issuecomment-690346056


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3910: [CARBONDATA-3969] Fix Deserialization issue with DataType class

2020-09-10 Thread GitBox


ajantha-bhat commented on pull request #3910:
URL: https://github.com/apache/carbondata/pull/3910#issuecomment-690337464


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-09-10 Thread GitBox


ajantha-bhat commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-690331643


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-09-10 Thread GitBox


kunal642 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-690327900


   @ajantha-bhat @QiangCai @akashrn5 build passed
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3918: [WIP] Use Partition instead of Span to split legacy and non-legacy segments for executor distribution in indexserver

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3918:
URL: https://github.com/apache/carbondata/pull/3918#issuecomment-690300076


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4039/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3918: [WIP] Use Partition instead of Span to split legacy and non-legacy segments for executor distribution in indexserver

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3918:
URL: https://github.com/apache/carbondata/pull/3918#issuecomment-690298089


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2300/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#issuecomment-690295508


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2299/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3913:
URL: https://github.com/apache/carbondata/pull/3913#issuecomment-690292515


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4038/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-3923) Support global sort for Secondary index table

2020-09-10 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-3923.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> Support global sort for Secondary index table
> -
>
> Key: CARBONDATA-3923
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3923
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Ajantha Bhat
>Assignee: Ajantha Bhat
>Priority: Major
> Fix For: 2.1.0
>
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> SI always uses local sort to create the segment. If global sort is used, 
> filter on SI column can give the faster results.
>  
> So, Support global sort for Secondary index. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] asfgit closed pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


asfgit closed pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


kunal642 commented on pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#issuecomment-690252527


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


akashrn5 commented on pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#issuecomment-690245938


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


akashrn5 commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486228364



##
File path: 
index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala
##
@@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with 
BeforeAndAfterAll {
   .contains("Alter table drop column operation failed:"))
   }
 
+  test("test create secondary index global sort after insert") {
+sql("drop table if exists table1")
+sql("create table table1 (name string, id string, country string) stored 
as carbondata")
+sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', 
'1', 'india'")
+sql("create index table1_index on table table1(id, country) as 
'carbondata' properties" +
+"('sort_scope'='global_sort', 'Global_sort_partitions'='3')")
+checkAnswerWithoutSort(sql("select id, country from table1_index"),
+  Seq(Row("1", "india"), Row("2", "china")))
+// check for valid sort_scope
+checkExistence(sql("describe formatted table1_index"), true, "Sort Scope 
global_sort")
+// check the invalid sort scope
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')"))
+  .getMessage
+  .contains("Invalid SORT_SCOPE tim_sort"))
+// check for invalid global_sort_partitions
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')"))
+  .getMessage
+  .contains("Table property global_sort_partitions : -1 is invalid"))
+sql("drop index table1_index on table1")

Review comment:
   can just do drop table, it will drop index too, no need to separately 
run drop index and suggest to give a better tableName and index name and please 
check other test for same input.

##
File path: 
index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala
##
@@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with 
BeforeAndAfterAll {
   .contains("Alter table drop column operation failed:"))
   }
 
+  test("test create secondary index global sort after insert") {
+sql("drop table if exists table1")
+sql("create table table1 (name string, id string, country string) stored 
as carbondata")
+sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', 
'1', 'india'")
+sql("create index table1_index on table table1(id, country) as 
'carbondata' properties" +
+"('sort_scope'='global_sort', 'Global_sort_partitions'='3')")
+checkAnswerWithoutSort(sql("select id, country from table1_index"),
+  Seq(Row("1", "india"), Row("2", "china")))
+// check for valid sort_scope
+checkExistence(sql("describe formatted table1_index"), true, "Sort Scope 
global_sort")
+// check the invalid sort scope
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')"))
+  .getMessage
+  .contains("Invalid SORT_SCOPE tim_sort"))
+// check for invalid global_sort_partitions
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')"))
+  .getMessage
+  .contains("Table property global_sort_partitions : -1 is invalid"))
+sql("drop index table1_index on table1")

Review comment:
   a) not talking about overhead, why to call the command when that will be 
handled by drop table, why to take effort to call another command, please 
remove it and same for other test case.
   b) even though its not an example file, we  should always give proper and 
meaningful names. Just because user uses carbon and see code, we cant give non 
meaningful names right...!!!

##
File path: 
index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala
##
@@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with 
BeforeAndAfterAll {
   .contains("Alter table drop column operation failed:"))
   }
 
+  test("test create secondary index global sort after insert") {
+sql("drop table if exists table1")
+sql("create table table1 (name string, id string, country string) stored 
as carbondata")
+sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', 
'1', 'india'")
+sql("create index table1_index on table table1(id, country) as 
'carbondata' properties" +
+

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


ajantha-bhat commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486264018



##
File path: 
index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala
##
@@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with 
BeforeAndAfterAll {
   .contains("Alter table drop column operation failed:"))
   }
 
+  test("test create secondary index global sort after insert") {
+sql("drop table if exists table1")
+sql("create table table1 (name string, id string, country string) stored 
as carbondata")
+sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', 
'1', 'india'")
+sql("create index table1_index on table table1(id, country) as 
'carbondata' properties" +
+"('sort_scope'='global_sort', 'Global_sort_partitions'='3')")
+checkAnswerWithoutSort(sql("select id, country from table1_index"),
+  Seq(Row("1", "india"), Row("2", "china")))
+// check for valid sort_scope
+checkExistence(sql("describe formatted table1_index"), true, "Sort Scope 
global_sort")
+// check the invalid sort scope
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')"))
+  .getMessage
+  .contains("Invalid SORT_SCOPE tim_sort"))
+// check for invalid global_sort_partitions
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')"))
+  .getMessage
+  .contains("Table property global_sort_partitions : -1 is invalid"))
+sql("drop index table1_index on table1")

Review comment:
   a) I know, but calling drop index will not add extra overhead.
   b) For test cases these names are enough ! This is not an example file.

##
File path: 
index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala
##
@@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with 
BeforeAndAfterAll {
   .contains("Alter table drop column operation failed:"))
   }
 
+  test("test create secondary index global sort after insert") {
+sql("drop table if exists table1")
+sql("create table table1 (name string, id string, country string) stored 
as carbondata")
+sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', 
'1', 'india'")
+sql("create index table1_index on table table1(id, country) as 
'carbondata' properties" +
+"('sort_scope'='global_sort', 'Global_sort_partitions'='3')")
+checkAnswerWithoutSort(sql("select id, country from table1_index"),
+  Seq(Row("1", "india"), Row("2", "china")))
+// check for valid sort_scope
+checkExistence(sql("describe formatted table1_index"), true, "Sort Scope 
global_sort")
+// check the invalid sort scope
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')"))
+  .getMessage
+  .contains("Invalid SORT_SCOPE tim_sort"))
+// check for invalid global_sort_partitions
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')"))
+  .getMessage
+  .contains("Table property global_sort_partitions : -1 is invalid"))
+sql("drop index table1_index on table1")

Review comment:
   >  we cant give non meaningful names right...!!!
   
   table1 is a meaningful name to represent it as a table, it is like john wick 
calling his dog as a dog. On a lighter note, Stop focusing on unimportant 
things (table1 is used in other 100 places also). As an experienced developer, 
I do know when a code is not readable. 

##
File path: 
index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala
##
@@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with 
BeforeAndAfterAll {
   .contains("Alter table drop column operation failed:"))
   }
 
+  test("test create secondary index global sort after insert") {
+sql("drop table if exists table1")
+sql("create table table1 (name string, id string, country string) stored 
as carbondata")
+sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', 
'1', 'india'")
+sql("create index table1_index on table table1(id, country) as 
'carbondata' properties" +
+"('sort_scope'='global_sort', 'Global_sort_partitions'='3')")
+checkAnswerWithoutSort(sql("select id, 

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


ajantha-bhat commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486289292



##
File path: 
index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala
##
@@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with 
BeforeAndAfterAll {
   .contains("Alter table drop column operation failed:"))
   }
 
+  test("test create secondary index global sort after insert") {
+sql("drop table if exists table1")
+sql("create table table1 (name string, id string, country string) stored 
as carbondata")
+sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', 
'1', 'india'")
+sql("create index table1_index on table table1(id, country) as 
'carbondata' properties" +
+"('sort_scope'='global_sort', 'Global_sort_partitions'='3')")
+checkAnswerWithoutSort(sql("select id, country from table1_index"),
+  Seq(Row("1", "india"), Row("2", "china")))
+// check for valid sort_scope
+checkExistence(sql("describe formatted table1_index"), true, "Sort Scope 
global_sort")
+// check the invalid sort scope
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')"))
+  .getMessage
+  .contains("Invalid SORT_SCOPE tim_sort"))
+// check for invalid global_sort_partitions
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')"))
+  .getMessage
+  .contains("Table property global_sort_partitions : -1 is invalid"))
+sql("drop index table1_index on table1")

Review comment:
   I can see that from your PR #3608 , you have used t1 as table name and 
i1 as index name in DropTableTest. is that clean and meaningful name ? I don't 
want to argue further. 
   Table1 is still a table name I have not named it as car or bike. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


akashrn5 commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486279565



##
File path: 
index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala
##
@@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with 
BeforeAndAfterAll {
   .contains("Alter table drop column operation failed:"))
   }
 
+  test("test create secondary index global sort after insert") {
+sql("drop table if exists table1")
+sql("create table table1 (name string, id string, country string) stored 
as carbondata")
+sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', 
'1', 'india'")
+sql("create index table1_index on table table1(id, country) as 
'carbondata' properties" +
+"('sort_scope'='global_sort', 'Global_sort_partitions'='3')")
+checkAnswerWithoutSort(sql("select id, country from table1_index"),
+  Seq(Row("1", "india"), Row("2", "china")))
+// check for valid sort_scope
+checkExistence(sql("describe formatted table1_index"), true, "Sort Scope 
global_sort")
+// check the invalid sort scope
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')"))
+  .getMessage
+  .contains("Invalid SORT_SCOPE tim_sort"))
+// check for invalid global_sort_partitions
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')"))
+  .getMessage
+  .contains("Table property global_sort_partitions : -1 is invalid"))
+sql("drop index table1_index on table1")

Review comment:
   its not about the experience, but i always prefer the code to be very 
clean and meaningful and reader or developer should be happy reading it. Clean 
and meaningful names are very important aspect in any code...!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


akashrn5 commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486279565



##
File path: 
index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala
##
@@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with 
BeforeAndAfterAll {
   .contains("Alter table drop column operation failed:"))
   }
 
+  test("test create secondary index global sort after insert") {
+sql("drop table if exists table1")
+sql("create table table1 (name string, id string, country string) stored 
as carbondata")
+sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', 
'1', 'india'")
+sql("create index table1_index on table table1(id, country) as 
'carbondata' properties" +
+"('sort_scope'='global_sort', 'Global_sort_partitions'='3')")
+checkAnswerWithoutSort(sql("select id, country from table1_index"),
+  Seq(Row("1", "india"), Row("2", "china")))
+// check for valid sort_scope
+checkExistence(sql("describe formatted table1_index"), true, "Sort Scope 
global_sort")
+// check the invalid sort scope
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')"))
+  .getMessage
+  .contains("Invalid SORT_SCOPE tim_sort"))
+// check for invalid global_sort_partitions
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')"))
+  .getMessage
+  .contains("Table property global_sort_partitions : -1 is invalid"))
+sql("drop index table1_index on table1")

Review comment:
   its not about the experience, but i always prefer the code to be very 
clean and meaningful and reader or developer should be happy reading about it 
and clean and meaningful names are very important aspect in any code...!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-690218874


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4035/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#issuecomment-690218157


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2297/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-690217853


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2296/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


ajantha-bhat commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486276366



##
File path: 
index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala
##
@@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with 
BeforeAndAfterAll {
   .contains("Alter table drop column operation failed:"))
   }
 
+  test("test create secondary index global sort after insert") {
+sql("drop table if exists table1")
+sql("create table table1 (name string, id string, country string) stored 
as carbondata")
+sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', 
'1', 'india'")
+sql("create index table1_index on table table1(id, country) as 
'carbondata' properties" +
+"('sort_scope'='global_sort', 'Global_sort_partitions'='3')")
+checkAnswerWithoutSort(sql("select id, country from table1_index"),
+  Seq(Row("1", "india"), Row("2", "china")))
+// check for valid sort_scope
+checkExistence(sql("describe formatted table1_index"), true, "Sort Scope 
global_sort")
+// check the invalid sort scope
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')"))
+  .getMessage
+  .contains("Invalid SORT_SCOPE tim_sort"))
+// check for invalid global_sort_partitions
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')"))
+  .getMessage
+  .contains("Table property global_sort_partitions : -1 is invalid"))
+sql("drop index table1_index on table1")

Review comment:
   >  we cant give non meaningful names right...!!!
   
   table1 is a meaningful name to represent it as a table, it is like john wick 
calling his dog as a dog. On a lighter note, Stop focusing on unimportant 
things (table1 is used in other 100 places also). As an experienced developer, 
I do know when a code is not readable. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


ajantha-bhat commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486276366



##
File path: 
index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala
##
@@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with 
BeforeAndAfterAll {
   .contains("Alter table drop column operation failed:"))
   }
 
+  test("test create secondary index global sort after insert") {
+sql("drop table if exists table1")
+sql("create table table1 (name string, id string, country string) stored 
as carbondata")
+sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', 
'1', 'india'")
+sql("create index table1_index on table table1(id, country) as 
'carbondata' properties" +
+"('sort_scope'='global_sort', 'Global_sort_partitions'='3')")
+checkAnswerWithoutSort(sql("select id, country from table1_index"),
+  Seq(Row("1", "india"), Row("2", "china")))
+// check for valid sort_scope
+checkExistence(sql("describe formatted table1_index"), true, "Sort Scope 
global_sort")
+// check the invalid sort scope
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')"))
+  .getMessage
+  .contains("Invalid SORT_SCOPE tim_sort"))
+// check for invalid global_sort_partitions
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')"))
+  .getMessage
+  .contains("Table property global_sort_partitions : -1 is invalid"))
+sql("drop index table1_index on table1")

Review comment:
   >  we cant give non meaningful names right...!!!
   table1 is a meaningful name to represent it as a table, it is like john wick 
calling his dog as a dog. On a lighter note, Stop focusing on unimportant 
things. As an experienced developer, I do know when a code is not readable. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#issuecomment-690216620


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4036/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


akashrn5 commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486269619



##
File path: 
index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala
##
@@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with 
BeforeAndAfterAll {
   .contains("Alter table drop column operation failed:"))
   }
 
+  test("test create secondary index global sort after insert") {
+sql("drop table if exists table1")
+sql("create table table1 (name string, id string, country string) stored 
as carbondata")
+sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', 
'1', 'india'")
+sql("create index table1_index on table table1(id, country) as 
'carbondata' properties" +
+"('sort_scope'='global_sort', 'Global_sort_partitions'='3')")
+checkAnswerWithoutSort(sql("select id, country from table1_index"),
+  Seq(Row("1", "india"), Row("2", "china")))
+// check for valid sort_scope
+checkExistence(sql("describe formatted table1_index"), true, "Sort Scope 
global_sort")
+// check the invalid sort scope
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')"))
+  .getMessage
+  .contains("Invalid SORT_SCOPE tim_sort"))
+// check for invalid global_sort_partitions
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')"))
+  .getMessage
+  .contains("Table property global_sort_partitions : -1 is invalid"))
+sql("drop index table1_index on table1")

Review comment:
   a) not talking about overhead, why to call the command when that will be 
handled by drop table, why to take effort to call another command, please 
remove it and same for other test case.
   b) even though its not an example file, we  should always give proper and 
meaningful names. Just because user uses carbon and see code, we cant give non 
meaningful names right...!!!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3918: [WIP] Use Partition instead of Span to split legacy and non-legacy segments for executor distribution in indexserver

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3918:
URL: https://github.com/apache/carbondata/pull/3918#issuecomment-690200478


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4034/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3919: [CARBONDATA-3980] Load fails with aborted exception when Bad records action is unspecified

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3919:
URL: https://github.com/apache/carbondata/pull/3919#issuecomment-690199098


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4037/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


ajantha-bhat commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486266977



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##
@@ -89,7 +104,7 @@ private[sql] class CarbonLateDecodeStrategy extends 
SparkStrategy {
 try {
   pruneFilterProject(
 l,
-
projects.filterNot(_.name.equalsIgnoreCase(CarbonCommonConstants.POSITION_ID)),

Review comment:
   Because these position id should not be removed always, it has to be 
removed only in certain conditions (example, when `isPositionIDRequested` 
property is not set) 
   
   Inside this `getRequestedColumns` will take care of removing it based on 
conditions.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3919: [CARBONDATA-3980] Load fails with aborted exception when Bad records action is unspecified

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3919:
URL: https://github.com/apache/carbondata/pull/3919#issuecomment-690196778


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2298/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3918: [WIP] Use Partition instead of Span to split legacy and non-legacy segments for executor distribution in indexserver

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3918:
URL: https://github.com/apache/carbondata/pull/3918#issuecomment-690196116


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2295/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


ajantha-bhat commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486264018



##
File path: 
index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala
##
@@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with 
BeforeAndAfterAll {
   .contains("Alter table drop column operation failed:"))
   }
 
+  test("test create secondary index global sort after insert") {
+sql("drop table if exists table1")
+sql("create table table1 (name string, id string, country string) stored 
as carbondata")
+sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', 
'1', 'india'")
+sql("create index table1_index on table table1(id, country) as 
'carbondata' properties" +
+"('sort_scope'='global_sort', 'Global_sort_partitions'='3')")
+checkAnswerWithoutSort(sql("select id, country from table1_index"),
+  Seq(Row("1", "india"), Row("2", "china")))
+// check for valid sort_scope
+checkExistence(sql("describe formatted table1_index"), true, "Sort Scope 
global_sort")
+// check the invalid sort scope
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')"))
+  .getMessage
+  .contains("Invalid SORT_SCOPE tim_sort"))
+// check for invalid global_sort_partitions
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')"))
+  .getMessage
+  .contains("Table property global_sort_partitions : -1 is invalid"))
+sql("drop index table1_index on table1")

Review comment:
   a) I know, but calling drop index will not add extra overhead.
   b) For test cases these names are enough ! This is not an example file.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


ajantha-bhat commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486261767



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
##
@@ -801,6 +802,26 @@ object CommonUtil {
 }
   }
 
+  def validateGlobalSortPartitions(propertiesMap: mutable.Map[String, 
String]): Unit = {
+if (propertiesMap.get("global_sort_partitions").isDefined) {
+  val globalSortPartitionsProp = propertiesMap("global_sort_partitions")
+  var pass = false
+  try {
+val globalSortPartitions = Integer.parseInt(globalSortPartitionsProp)
+if (globalSortPartitions > 0) {
+  pass = true
+}
+  } catch {
+case _ =>
+  }
+  if (!pass) {

Review comment:
   If there is only one condition I would have done that only, I need to 
check `globalSortPartitions > 0` also and throw same error. Hence handing error 
at once place





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (CARBONDATA-3980) Load fails with aborted exception when Bad records action is unspecified

2020-09-10 Thread SHREELEKHYA GAMPA (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SHREELEKHYA GAMPA updated CARBONDATA-3980:
--
   Description: 
When the partition column is loaded with a bad record value, load fails with 
'Job aborted' message in cluster. However in complete stack trace we can see 
the actual error message. ('Data load failed due to bad record: The value with 
column name projectjoindate and column data type TIMESTAMP is not a valid 
TIMESTAMP type') 

Bug id: BUG2020082802430
PR link: https://github.com/apache/carbondata/pull/3919

  was:
When the partition column is loaded with a bad record value, load fails with 
'Job aborted' message in cluster. However in complete stack trace we can see 
the actual error message. ('Data load failed due to bad record: The value with 
column name projectjoindate and column data type TIMESTAMP is not a valid 
TIMESTAMP type') 

Bug id: BUG2020082802430


Remaining Estimate: (was: 0h)

> Load fails with aborted exception when Bad records action is unspecified
> 
>
> Key: CARBONDATA-3980
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3980
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Minor
>  Time Spent: 10m
>
> When the partition column is loaded with a bad record value, load fails with 
> 'Job aborted' message in cluster. However in complete stack trace we can see 
> the actual error message. ('Data load failed due to bad record: The value 
> with column name projectjoindate and column data type TIMESTAMP is not a 
> valid TIMESTAMP type') 
> Bug id: BUG2020082802430
> PR link: https://github.com/apache/carbondata/pull/3919



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] ShreelekhyaG opened a new pull request #3919: [CARBONDATA-3980] Load fails with aborted exception when Bad records action is unspecified

2020-09-10 Thread GitBox


ShreelekhyaG opened a new pull request #3919:
URL: https://github.com/apache/carbondata/pull/3919


### Why is this PR needed?
   Load fails with aborted exception when Bad records action is unspecified.
   
   When the partition column is loaded with a bad record value, load fails with 
'Job aborted' message in cluster. However in complete stack trace we can see 
the actual error message. (Like, 'Data load failed due to bad record: The value 
with column name projectjoindate and column data type TIMESTAMP is not a valid 
TIMESTAMP type') 

### What changes were proposed in this PR?
Fix bad record error message for the partition column. Added the error 
message to `operationContext` map and if its not null throwing exception with 
`errorMessage` from  `CarbonLoadDataCommand`.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No, tested in cluster.
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3980) Load fails with aborted exception when Bad records action is unspecified

2020-09-10 Thread SHREELEKHYA GAMPA (Jira)
SHREELEKHYA GAMPA created CARBONDATA-3980:
-

 Summary: Load fails with aborted exception when Bad records action 
is unspecified
 Key: CARBONDATA-3980
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3980
 Project: CarbonData
  Issue Type: Bug
Reporter: SHREELEKHYA GAMPA


When the partition column is loaded with a bad record value, load fails with 
'Job aborted' message in cluster. However in complete stack trace we can see 
the actual error message. ('Data load failed due to bad record: The value with 
column name projectjoindate and column data type TIMESTAMP is not a valid 
TIMESTAMP type') 

Bug id: BUG2020082802430




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] kunal642 commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


kunal642 commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486246357



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
##
@@ -89,7 +104,7 @@ private[sql] class CarbonLateDecodeStrategy extends 
SparkStrategy {
 try {
   pruneFilterProject(
 l,
-
projects.filterNot(_.name.equalsIgnoreCase(CarbonCommonConstants.POSITION_ID)),

Review comment:
   why is the filter removed?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


kunal642 commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486243458



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
##
@@ -801,6 +802,26 @@ object CommonUtil {
 }
   }
 
+  def validateGlobalSortPartitions(propertiesMap: mutable.Map[String, 
String]): Unit = {
+if (propertiesMap.get("global_sort_partitions").isDefined) {
+  val globalSortPartitionsProp = propertiesMap("global_sort_partitions")
+  var pass = false
+  try {
+val globalSortPartitions = Integer.parseInt(globalSortPartitionsProp)
+if (globalSortPartitions > 0) {
+  pass = true
+}
+  } catch {
+case _ =>
+  }
+  if (!pass) {

Review comment:
   no, keeping this variable doesn't make sense. please catch Parsing 
Exception an throw MalformedCarbonCommandException direclty





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


akashrn5 commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486228364



##
File path: 
index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala
##
@@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with 
BeforeAndAfterAll {
   .contains("Alter table drop column operation failed:"))
   }
 
+  test("test create secondary index global sort after insert") {
+sql("drop table if exists table1")
+sql("create table table1 (name string, id string, country string) stored 
as carbondata")
+sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', 
'1', 'india'")
+sql("create index table1_index on table table1(id, country) as 
'carbondata' properties" +
+"('sort_scope'='global_sort', 'Global_sort_partitions'='3')")
+checkAnswerWithoutSort(sql("select id, country from table1_index"),
+  Seq(Row("1", "india"), Row("2", "china")))
+// check for valid sort_scope
+checkExistence(sql("describe formatted table1_index"), true, "Sort Scope 
global_sort")
+// check the invalid sort scope
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')"))
+  .getMessage
+  .contains("Invalid SORT_SCOPE tim_sort"))
+// check for invalid global_sort_partitions
+assert(intercept[MalformedCarbonCommandException](sql(
+  "create index index_2 on table table1(id, country) as 'carbondata' 
properties" +
+  "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')"))
+  .getMessage
+  .contains("Table property global_sort_partitions : -1 is invalid"))
+sql("drop index table1_index on table1")

Review comment:
   can just do drop table, it will drop index too, no need to separately 
run drop index and suggest to give a better tableName and index name and please 
check other test for same input.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


ajantha-bhat commented on pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#issuecomment-690129651


   @akashrn5 : handled comments, please check and merge once build passes



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


ajantha-bhat commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486218886



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala
##
@@ -428,4 +552,40 @@ object SecondaryIndexCreator {
 }
 threadPoolSize
   }
+
+  def dataFrameOfSegments(
+  sparkSession: SparkSession,
+  carbonTable: CarbonTable,
+  projections: String,
+  segments: Array[String]): DataFrame = {
+try {
+  CarbonUtils
+.threadSet(CarbonCommonConstants.CARBON_INPUT_SEGMENTS +
+   carbonTable.getDatabaseName + CarbonCommonConstants.POINT +
+   carbonTable.getTableName,
+  segments.mkString(","))

Review comment:
   Moved. These are created by reformat command itself (ctrl + alt + shift 
+ L), so need the correct tool to properly reformat or not use it.

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala
##
@@ -152,68 +158,181 @@ object SecondaryIndexCreator {
   LOGGER.info("spark.dynamicAllocation.maxExecutors property is set to 
=" + execInstance)
 }
   }
-  var futureObjectList = List[java.util.concurrent.Future[Array[(String, 
Boolean)]]]()
-  for (eachSegment <- validSegmentList) {
-val segId = eachSegment
-futureObjectList :+= executorService.submit(new 
Callable[Array[(String, Boolean)]] {
-  @throws(classOf[Exception])
-  override def call(): Array[(String, Boolean)] = {
-
ThreadLocalSessionInfo.getOrCreateCarbonSessionInfo().getNonSerializableExtraInfo
-  .put("carbonConf", 
SparkSQLUtil.sessionState(sc.sparkSession).newHadoopConf())
-var eachSegmentSecondaryIndexCreationStatus: Array[(String, 
Boolean)] = Array.empty
-CarbonLoaderUtil.checkAndCreateCarbonDataLocation(segId, 
indexCarbonTable)
-val carbonLoadModel = getCopyObject(secondaryIndexModel)
-carbonLoadModel
-  
.setFactTimeStamp(secondaryIndexModel.segmentIdToLoadStartTimeMapping(eachSegment))
-
carbonLoadModel.setTablePath(secondaryIndexModel.carbonTable.getTablePath)
-val secondaryIndexCreationStatus = new 
CarbonSecondaryIndexRDD(sc.sparkSession,
-  new SecondaryIndexCreationResultImpl,
-  carbonLoadModel,
-  secondaryIndexModel.secondaryIndex,
-  segId, execInstance, indexCarbonTable, forceAccessSegment, 
isCompactionCall).collect()
+  var successSISegments: List[String] = List()
+  var failedSISegments: List[String] = List()
+  val sort_scope = 
indexCarbonTable.getTableInfo.getFactTable.getTableProperties
+.get("sort_scope")
+  if (sort_scope != null && sort_scope.equalsIgnoreCase("global_sort")) {
+val mainTable = 
secondaryIndexModel.carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
+var futureObjectList = List[java.util.concurrent.Future[Array[(String,
+  (LoadMetadataDetails, ExecutionErrors))]]]()
+for (eachSegment <- validSegmentList) {
+  futureObjectList :+= executorService
+.submit(new Callable[Array[(String, (LoadMetadataDetails, 
ExecutionErrors))]] {
+  @throws(classOf[Exception])
+  override def call(): Array[(String, (LoadMetadataDetails, 
ExecutionErrors))] = {
+val carbonLoadModel = getCopyObject(secondaryIndexModel)
+// loading, we need to query main table add position reference
+val proj = indexCarbonTable.getCreateOrderColumn
+  .asScala
+  .map(_.getColName)
+  .filterNot(_.equals("positionReference")).toSet

Review comment:
   done

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala
##
@@ -152,68 +158,181 @@ object SecondaryIndexCreator {
   LOGGER.info("spark.dynamicAllocation.maxExecutors property is set to 
=" + execInstance)
 }
   }
-  var futureObjectList = List[java.util.concurrent.Future[Array[(String, 
Boolean)]]]()
-  for (eachSegment <- validSegmentList) {
-val segId = eachSegment
-futureObjectList :+= executorService.submit(new 
Callable[Array[(String, Boolean)]] {
-  @throws(classOf[Exception])
-  override def call(): Array[(String, Boolean)] = {
-
ThreadLocalSessionInfo.getOrCreateCarbonSessionInfo().getNonSerializableExtraInfo
-  .put("carbonConf", 
SparkSQLUtil.sessionState(sc.sparkSession).newHadoopConf())
-var eachSegmentSecondaryIndexCreationStatus: Array[(String, 
Boolean)] = Array.empty
-CarbonLoaderUtil.checkAndCreateCarbonDataLocation(segId, 
indexCarbonTable)
-val carbonLoadModel = 

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


ajantha-bhat commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486218662



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##
@@ -297,6 +298,10 @@ object CarbonIndexUtil {
   segmentIdToLoadStartTimeMapping = scala.collection.mutable
 .Map((carbonLoadModel.getSegmentId, carbonLoadModel.getFactTimeStamp))
 }
+val indexCarbonTable = CarbonEnv.getCarbonTable(

Review comment:
   ok.done

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala
##
@@ -152,68 +158,181 @@ object SecondaryIndexCreator {
   LOGGER.info("spark.dynamicAllocation.maxExecutors property is set to 
=" + execInstance)
 }
   }
-  var futureObjectList = List[java.util.concurrent.Future[Array[(String, 
Boolean)]]]()
-  for (eachSegment <- validSegmentList) {
-val segId = eachSegment
-futureObjectList :+= executorService.submit(new 
Callable[Array[(String, Boolean)]] {
-  @throws(classOf[Exception])
-  override def call(): Array[(String, Boolean)] = {
-
ThreadLocalSessionInfo.getOrCreateCarbonSessionInfo().getNonSerializableExtraInfo
-  .put("carbonConf", 
SparkSQLUtil.sessionState(sc.sparkSession).newHadoopConf())
-var eachSegmentSecondaryIndexCreationStatus: Array[(String, 
Boolean)] = Array.empty
-CarbonLoaderUtil.checkAndCreateCarbonDataLocation(segId, 
indexCarbonTable)
-val carbonLoadModel = getCopyObject(secondaryIndexModel)
-carbonLoadModel
-  
.setFactTimeStamp(secondaryIndexModel.segmentIdToLoadStartTimeMapping(eachSegment))
-
carbonLoadModel.setTablePath(secondaryIndexModel.carbonTable.getTablePath)
-val secondaryIndexCreationStatus = new 
CarbonSecondaryIndexRDD(sc.sparkSession,
-  new SecondaryIndexCreationResultImpl,
-  carbonLoadModel,
-  secondaryIndexModel.secondaryIndex,
-  segId, execInstance, indexCarbonTable, forceAccessSegment, 
isCompactionCall).collect()
+  var successSISegments: List[String] = List()
+  var failedSISegments: List[String] = List()
+  val sort_scope = 
indexCarbonTable.getTableInfo.getFactTable.getTableProperties
+.get("sort_scope")
+  if (sort_scope != null && sort_scope.equalsIgnoreCase("global_sort")) {
+val mainTable = 
secondaryIndexModel.carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
+var futureObjectList = List[java.util.concurrent.Future[Array[(String,
+  (LoadMetadataDetails, ExecutionErrors))]]]()
+for (eachSegment <- validSegmentList) {
+  futureObjectList :+= executorService
+.submit(new Callable[Array[(String, (LoadMetadataDetails, 
ExecutionErrors))]] {
+  @throws(classOf[Exception])
+  override def call(): Array[(String, (LoadMetadataDetails, 
ExecutionErrors))] = {
+val carbonLoadModel = getCopyObject(secondaryIndexModel)
+// loading, we need to query main table add position reference
+val proj = indexCarbonTable.getCreateOrderColumn

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


ajantha-bhat commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486219230



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
##
@@ -801,6 +802,26 @@ object CommonUtil {
 }
   }
 
+  def validateGlobalSortPartitions(propertiesMap: mutable.Map[String, 
String]): Unit = {
+if (propertiesMap.get("global_sort_partitions").isDefined) {

Review comment:
   done

##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
##
@@ -801,6 +802,26 @@ object CommonUtil {
 }
   }
 
+  def validateGlobalSortPartitions(propertiesMap: mutable.Map[String, 
String]): Unit = {
+if (propertiesMap.get("global_sort_partitions").isDefined) {
+  val globalSortPartitionsProp = propertiesMap("global_sort_partitions")
+  var pass = false
+  try {
+val globalSortPartitions = Integer.parseInt(globalSortPartitionsProp)

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


ajantha-bhat commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486219088



##
File path: docs/index/secondary-index-guide.md
##
@@ -84,7 +84,8 @@ EXPLAIN SELECT a from maintable where c = 'cd';
   'carbondata'
   PROPERTIES('table_blocksize'='1')
   ```
- 
+  **NOTE**:
+  * supported properties are table_blocksize, column_meta_cache, cache_level, 
carbon.column.compressor, sort_scope, global_sort_partitions

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


ajantha-bhat commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486218379



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala
##
@@ -152,68 +158,181 @@ object SecondaryIndexCreator {
   LOGGER.info("spark.dynamicAllocation.maxExecutors property is set to 
=" + execInstance)
 }
   }
-  var futureObjectList = List[java.util.concurrent.Future[Array[(String, 
Boolean)]]]()
-  for (eachSegment <- validSegmentList) {
-val segId = eachSegment
-futureObjectList :+= executorService.submit(new 
Callable[Array[(String, Boolean)]] {
-  @throws(classOf[Exception])
-  override def call(): Array[(String, Boolean)] = {
-
ThreadLocalSessionInfo.getOrCreateCarbonSessionInfo().getNonSerializableExtraInfo
-  .put("carbonConf", 
SparkSQLUtil.sessionState(sc.sparkSession).newHadoopConf())
-var eachSegmentSecondaryIndexCreationStatus: Array[(String, 
Boolean)] = Array.empty
-CarbonLoaderUtil.checkAndCreateCarbonDataLocation(segId, 
indexCarbonTable)
-val carbonLoadModel = getCopyObject(secondaryIndexModel)
-carbonLoadModel
-  
.setFactTimeStamp(secondaryIndexModel.segmentIdToLoadStartTimeMapping(eachSegment))
-
carbonLoadModel.setTablePath(secondaryIndexModel.carbonTable.getTablePath)
-val secondaryIndexCreationStatus = new 
CarbonSecondaryIndexRDD(sc.sparkSession,
-  new SecondaryIndexCreationResultImpl,
-  carbonLoadModel,
-  secondaryIndexModel.secondaryIndex,
-  segId, execInstance, indexCarbonTable, forceAccessSegment, 
isCompactionCall).collect()
+  var successSISegments: List[String] = List()
+  var failedSISegments: List[String] = List()
+  val sort_scope = 
indexCarbonTable.getTableInfo.getFactTable.getTableProperties
+.get("sort_scope")
+  if (sort_scope != null && sort_scope.equalsIgnoreCase("global_sort")) {
+val mainTable = 
secondaryIndexModel.carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
+var futureObjectList = List[java.util.concurrent.Future[Array[(String,
+  (LoadMetadataDetails, ExecutionErrors))]]]()
+for (eachSegment <- validSegmentList) {
+  futureObjectList :+= executorService
+.submit(new Callable[Array[(String, (LoadMetadataDetails, 
ExecutionErrors))]] {
+  @throws(classOf[Exception])
+  override def call(): Array[(String, (LoadMetadataDetails, 
ExecutionErrors))] = {
+val carbonLoadModel = getCopyObject(secondaryIndexModel)
+// loading, we need to query main table add position reference
+val proj = indexCarbonTable.getCreateOrderColumn
+  .asScala
+  .map(_.getColName)
+  .filterNot(_.equals("positionReference")).toSet
+val explodeColumn = mainTable.getCreateOrderColumn.asScala
+  .filter(x => x.getDataType.isComplexType &&
+   proj.contains(x.getColName))
+var dataFrame = dataFrameOfSegments(sc.sparkSession,
+  mainTable,
+  proj.mkString(","),
+  Array(eachSegment))
+// flatten the complex SI
+if (explodeColumn.nonEmpty) {
+  val columns = dataFrame.schema.map { x =>
+if (x.name.equals(explodeColumn.head.getColName)) {
+  functions.explode_outer(functions.col(x.name))
+} else {
+  functions.col(x.name)
+}
+  }
+  dataFrame = dataFrame.select(columns: _*)
+}
+val dataLoadSchema = new CarbonDataLoadSchema(indexCarbonTable)
+carbonLoadModel.setCarbonDataLoadSchema(dataLoadSchema)
+carbonLoadModel.setTableName(indexCarbonTable.getTableName)
+
carbonLoadModel.setDatabaseName(indexCarbonTable.getDatabaseName)
+carbonLoadModel.setTablePath(indexCarbonTable.getTablePath)
+carbonLoadModel.setFactTimeStamp(secondaryIndexModel
+  .segmentIdToLoadStartTimeMapping(eachSegment))
+carbonLoadModel.setSegmentId(eachSegment)
+var result: Array[(String, (LoadMetadataDetails, 
ExecutionErrors))] = null
+try {
+  val configuration = FileFactory.getConfiguration
+  
configuration.set(CarbonTableInputFormat.INPUT_SEGMENT_NUMBERS, eachSegment)
+  def findCarbonScanRDD(rdd: RDD[_]): Unit = {
+rdd match {
+  case d: CarbonScanRDD[_] =>

Review comment:
   done

##
File path: 

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


ajantha-bhat commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486218311



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala
##
@@ -152,68 +158,181 @@ object SecondaryIndexCreator {
   LOGGER.info("spark.dynamicAllocation.maxExecutors property is set to 
=" + execInstance)
 }
   }
-  var futureObjectList = List[java.util.concurrent.Future[Array[(String, 
Boolean)]]]()
-  for (eachSegment <- validSegmentList) {
-val segId = eachSegment
-futureObjectList :+= executorService.submit(new 
Callable[Array[(String, Boolean)]] {
-  @throws(classOf[Exception])
-  override def call(): Array[(String, Boolean)] = {
-
ThreadLocalSessionInfo.getOrCreateCarbonSessionInfo().getNonSerializableExtraInfo
-  .put("carbonConf", 
SparkSQLUtil.sessionState(sc.sparkSession).newHadoopConf())
-var eachSegmentSecondaryIndexCreationStatus: Array[(String, 
Boolean)] = Array.empty
-CarbonLoaderUtil.checkAndCreateCarbonDataLocation(segId, 
indexCarbonTable)
-val carbonLoadModel = getCopyObject(secondaryIndexModel)
-carbonLoadModel
-  
.setFactTimeStamp(secondaryIndexModel.segmentIdToLoadStartTimeMapping(eachSegment))
-
carbonLoadModel.setTablePath(secondaryIndexModel.carbonTable.getTablePath)
-val secondaryIndexCreationStatus = new 
CarbonSecondaryIndexRDD(sc.sparkSession,
-  new SecondaryIndexCreationResultImpl,
-  carbonLoadModel,
-  secondaryIndexModel.secondaryIndex,
-  segId, execInstance, indexCarbonTable, forceAccessSegment, 
isCompactionCall).collect()
+  var successSISegments: List[String] = List()
+  var failedSISegments: List[String] = List()
+  val sort_scope = 
indexCarbonTable.getTableInfo.getFactTable.getTableProperties
+.get("sort_scope")
+  if (sort_scope != null && sort_scope.equalsIgnoreCase("global_sort")) {
+val mainTable = 
secondaryIndexModel.carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
+var futureObjectList = List[java.util.concurrent.Future[Array[(String,
+  (LoadMetadataDetails, ExecutionErrors))]]]()
+for (eachSegment <- validSegmentList) {
+  futureObjectList :+= executorService
+.submit(new Callable[Array[(String, (LoadMetadataDetails, 
ExecutionErrors))]] {
+  @throws(classOf[Exception])
+  override def call(): Array[(String, (LoadMetadataDetails, 
ExecutionErrors))] = {
+val carbonLoadModel = getCopyObject(secondaryIndexModel)
+// loading, we need to query main table add position reference
+val proj = indexCarbonTable.getCreateOrderColumn
+  .asScala
+  .map(_.getColName)
+  .filterNot(_.equals("positionReference")).toSet
+val explodeColumn = mainTable.getCreateOrderColumn.asScala
+  .filter(x => x.getDataType.isComplexType &&
+   proj.contains(x.getColName))
+var dataFrame = dataFrameOfSegments(sc.sparkSession,
+  mainTable,
+  proj.mkString(","),
+  Array(eachSegment))
+// flatten the complex SI
+if (explodeColumn.nonEmpty) {
+  val columns = dataFrame.schema.map { x =>
+if (x.name.equals(explodeColumn.head.getColName)) {
+  functions.explode_outer(functions.col(x.name))
+} else {
+  functions.col(x.name)
+}
+  }
+  dataFrame = dataFrame.select(columns: _*)
+}
+val dataLoadSchema = new CarbonDataLoadSchema(indexCarbonTable)
+carbonLoadModel.setCarbonDataLoadSchema(dataLoadSchema)
+carbonLoadModel.setTableName(indexCarbonTable.getTableName)
+
carbonLoadModel.setDatabaseName(indexCarbonTable.getDatabaseName)
+carbonLoadModel.setTablePath(indexCarbonTable.getTablePath)
+carbonLoadModel.setFactTimeStamp(secondaryIndexModel
+  .segmentIdToLoadStartTimeMapping(eachSegment))
+carbonLoadModel.setSegmentId(eachSegment)
+var result: Array[(String, (LoadMetadataDetails, 
ExecutionErrors))] = null
+try {
+  val configuration = FileFactory.getConfiguration
+  
configuration.set(CarbonTableInputFormat.INPUT_SEGMENT_NUMBERS, eachSegment)
+  def findCarbonScanRDD(rdd: RDD[_]): Unit = {
+rdd match {
+  case d: CarbonScanRDD[_] =>
+d.setValidateSegmentToAccess(false)
+   

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


ajantha-bhat commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486217971



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala
##
@@ -95,6 +95,8 @@ class CarbonScanRDD[T: ClassTag](
 
   private var readCommittedScope: ReadCommittedScope = _
 
+  private var validateSegmentToAccess: Boolean = true

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3979) Added Hive local dictionary support example

2020-09-10 Thread SHREELEKHYA GAMPA (Jira)
SHREELEKHYA GAMPA created CARBONDATA-3979:
-

 Summary: Added Hive local dictionary support example
 Key: CARBONDATA-3979
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3979
 Project: CarbonData
  Issue Type: Bug
Reporter: SHREELEKHYA GAMPA


 To verify local dictionary support in hive for the carbon tables created from 
spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] kunal642 commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-09-10 Thread GitBox


kunal642 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-690124448


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning

2020-09-10 Thread GitBox


kunal642 commented on pull request #3908:
URL: https://github.com/apache/carbondata/pull/3908#issuecomment-690123684


   @QiangCai build passed



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


ajantha-bhat commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486204559



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala
##
@@ -428,4 +552,40 @@ object SecondaryIndexCreator {
 }
 threadPoolSize
   }
+
+  def dataFrameOfSegments(
+  sparkSession: SparkSession,
+  carbonTable: CarbonTable,
+  projections: String,
+  segments: Array[String]): DataFrame = {
+try {
+  CarbonUtils
+.threadSet(CarbonCommonConstants.CARBON_INPUT_SEGMENTS +
+   carbonTable.getDatabaseName + CarbonCommonConstants.POINT +
+   carbonTable.getTableName,
+  segments.mkString(","))
+  val logicalPlan = sparkSession
+.sql(s"select $projections from ${ carbonTable.getDatabaseName }.${
+  carbonTable
+.getTableName
+}")

Review comment:
   Moved. These are created by reformat command itself (ctrl + alt + shift 
+ L), so need the correct tool to properly reformat or not use it. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 opened a new pull request #3918: [WIP] Use Partition instead of Span to split legacy and non-legacy segments for executor distribution in indexserver

2020-09-10 Thread GitBox


Indhumathi27 opened a new pull request #3918:
URL: https://github.com/apache/carbondata/pull/3918


   
   
### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


ajantha-bhat commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486196569



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
##
@@ -801,6 +802,26 @@ object CommonUtil {
 }
   }
 
+  def validateGlobalSortPartitions(propertiesMap: mutable.Map[String, 
String]): Unit = {
+if (propertiesMap.get("global_sort_partitions").isDefined) {
+  val globalSortPartitionsProp = propertiesMap("global_sort_partitions")
+  var pass = false
+  try {
+val globalSortPartitions = Integer.parseInt(globalSortPartitionsProp)
+if (globalSortPartitions > 0) {
+  pass = true
+}
+  } catch {
+case _ =>
+  }
+  if (!pass) {

Review comment:
   keeping a flag and handling error at one place is better I feel





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#issuecomment-690105564


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4033/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#issuecomment-690103617


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2294/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI

2020-09-10 Thread GitBox


akashrn5 commented on a change in pull request #3787:
URL: https://github.com/apache/carbondata/pull/3787#discussion_r486083368



##
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
##
@@ -801,6 +802,26 @@ object CommonUtil {
 }
   }
 
+  def validateGlobalSortPartitions(propertiesMap: mutable.Map[String, 
String]): Unit = {
+if (propertiesMap.get("global_sort_partitions").isDefined) {

Review comment:
   replace with `propertiesMap.contains("global_sort_partitions")`

##
File path: docs/index/secondary-index-guide.md
##
@@ -84,7 +84,8 @@ EXPLAIN SELECT a from maintable where c = 'cd';
   'carbondata'
   PROPERTIES('table_blocksize'='1')
   ```
- 
+  **NOTE**:
+  * supported properties are table_blocksize, column_meta_cache, cache_level, 
carbon.column.compressor, sort_scope, global_sort_partitions

Review comment:
   ```suggestion
 * supported properties are table_blocksize, column_meta_cache, 
cache_level, carbon.column.compressor, sort_scope and global_sort_partitions.
   ```

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala
##
@@ -297,6 +298,10 @@ object CarbonIndexUtil {
   segmentIdToLoadStartTimeMapping = scala.collection.mutable
 .Map((carbonLoadModel.getSegmentId, carbonLoadModel.getFactTimeStamp))
 }
+val indexCarbonTable = CarbonEnv.getCarbonTable(

Review comment:
   index table object is already present, please remove this

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala
##
@@ -152,68 +158,181 @@ object SecondaryIndexCreator {
   LOGGER.info("spark.dynamicAllocation.maxExecutors property is set to 
=" + execInstance)
 }
   }
-  var futureObjectList = List[java.util.concurrent.Future[Array[(String, 
Boolean)]]]()
-  for (eachSegment <- validSegmentList) {
-val segId = eachSegment
-futureObjectList :+= executorService.submit(new 
Callable[Array[(String, Boolean)]] {
-  @throws(classOf[Exception])
-  override def call(): Array[(String, Boolean)] = {
-
ThreadLocalSessionInfo.getOrCreateCarbonSessionInfo().getNonSerializableExtraInfo
-  .put("carbonConf", 
SparkSQLUtil.sessionState(sc.sparkSession).newHadoopConf())
-var eachSegmentSecondaryIndexCreationStatus: Array[(String, 
Boolean)] = Array.empty
-CarbonLoaderUtil.checkAndCreateCarbonDataLocation(segId, 
indexCarbonTable)
-val carbonLoadModel = getCopyObject(secondaryIndexModel)
-carbonLoadModel
-  
.setFactTimeStamp(secondaryIndexModel.segmentIdToLoadStartTimeMapping(eachSegment))
-
carbonLoadModel.setTablePath(secondaryIndexModel.carbonTable.getTablePath)
-val secondaryIndexCreationStatus = new 
CarbonSecondaryIndexRDD(sc.sparkSession,
-  new SecondaryIndexCreationResultImpl,
-  carbonLoadModel,
-  secondaryIndexModel.secondaryIndex,
-  segId, execInstance, indexCarbonTable, forceAccessSegment, 
isCompactionCall).collect()
+  var successSISegments: List[String] = List()
+  var failedSISegments: List[String] = List()
+  val sort_scope = 
indexCarbonTable.getTableInfo.getFactTable.getTableProperties
+.get("sort_scope")
+  if (sort_scope != null && sort_scope.equalsIgnoreCase("global_sort")) {
+val mainTable = 
secondaryIndexModel.carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
+var futureObjectList = List[java.util.concurrent.Future[Array[(String,
+  (LoadMetadataDetails, ExecutionErrors))]]]()
+for (eachSegment <- validSegmentList) {
+  futureObjectList :+= executorService
+.submit(new Callable[Array[(String, (LoadMetadataDetails, 
ExecutionErrors))]] {
+  @throws(classOf[Exception])
+  override def call(): Array[(String, (LoadMetadataDetails, 
ExecutionErrors))] = {
+val carbonLoadModel = getCopyObject(secondaryIndexModel)
+// loading, we need to query main table add position reference
+val proj = indexCarbonTable.getCreateOrderColumn

Review comment:
   rename to projections

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala
##
@@ -428,4 +552,40 @@ object SecondaryIndexCreator {
 }
 threadPoolSize
   }
+
+  def dataFrameOfSegments(
+  sparkSession: SparkSession,
+  carbonTable: CarbonTable,
+  projections: String,
+  segments: Array[String]): DataFrame = {
+try {
+  CarbonUtils
+.threadSet(CarbonCommonConstants.CARBON_INPUT_SEGMENTS +
+   carbonTable.getDatabaseName + CarbonCommonConstants.POINT +
+   carbonTable.getTableName,
+  

[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3917: [CARBONDATA-3978] Clean files refactor and added support for a trash folder where all the carbondata files will be copied to after

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#issuecomment-690060524


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2293/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3917: [CARBONDATA-3978] Clean files refactor and added support for a trash folder where all the carbondata files will be copied to after

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3917:
URL: https://github.com/apache/carbondata/pull/3917#issuecomment-690056910


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4032/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-690030669


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2291/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3908:
URL: https://github.com/apache/carbondata/pull/3908#issuecomment-690026266


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4029/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3902:
URL: https://github.com/apache/carbondata/pull/3902#issuecomment-690025455


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4030/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning

2020-09-10 Thread GitBox


CarbonDataQA1 commented on pull request #3908:
URL: https://github.com/apache/carbondata/pull/3908#issuecomment-690020844


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2290/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org