[GitHub] [carbondata] akashrn5 commented on pull request #3911: [CARBONDATA-3793]Fix update and delete issue when multiple partition columns are present and clean files issue
akashrn5 commented on pull request #3911: URL: https://github.com/apache/carbondata/pull/3911#issuecomment-690873243 @kunal642 build passed, please help to review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3982) Use Partition instead of Span to split legacy and non-legacy segments for executor distribution in indexserver
Indhumathi Muthumurugesh created CARBONDATA-3982: Summary: Use Partition instead of Span to split legacy and non-legacy segments for executor distribution in indexserver Key: CARBONDATA-3982 URL: https://issues.apache.org/jira/browse/CARBONDATA-3982 Project: CarbonData Issue Type: Bug Reporter: Indhumathi Muthumurugesh -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] Karan-c980 commented on pull request #3876: TestingCI
Karan-c980 commented on pull request #3876: URL: https://github.com/apache/carbondata/pull/3876#issuecomment-690870873 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3920: [CARBONDATA-3981] Presto filter check on binary datatype
ajantha-bhat commented on a change in pull request #3920: URL: https://github.com/apache/carbondata/pull/3920#discussion_r486760013 ## File path: integration/presto/src/test/scala/org/apache/carbondata/presto/integrationtest/PrestoTestNonTransactionalTableFiles.scala ## @@ -230,6 +230,37 @@ class PrestoTestNonTransactionalTableFiles extends FunSuiteLike with BeforeAndAf } } + def buildOnlyBinary(rows: Int, sortColumns: Array[String], path : String): Any = { Review comment: existing binary testcase only can you add filter query ? I guess no need to add new testcase for it, it will slow down CI running time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3920: [CARBONDATA-3981] Presto filter check on binary datatype
ajantha-bhat commented on a change in pull request #3920: URL: https://github.com/apache/carbondata/pull/3920#discussion_r486758917 ## File path: integration/presto/src/main/prestosql/org/apache/carbondata/presto/PrestoFilterUtil.java ## @@ -78,6 +78,8 @@ private static DataType spi2CarbondataTypeMapper(HiveColumnHandle columnHandle) HiveType colType = columnHandle.getHiveType(); if (colType.equals(HiveType.HIVE_BOOLEAN)) { return DataTypes.BOOLEAN; +} else if (colType.equals(HiveType.HIVE_BINARY)) { Review comment: I can see byte and float data type is also missing. can you add and test for it ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3920: [CARBONDATA-3981] Presto filter check on binary datatype
CarbonDataQA1 commented on pull request #3920: URL: https://github.com/apache/carbondata/pull/3920#issuecomment-690714479 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2305/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3920: [CARBONDATA-3981] Presto filter check on binary datatype
CarbonDataQA1 commented on pull request #3920: URL: https://github.com/apache/carbondata/pull/3920#issuecomment-690713492 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4043/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3920: [CARBONDATA-3981] Presto filter check on binary datatype
CarbonDataQA1 commented on pull request #3920: URL: https://github.com/apache/carbondata/pull/3920#issuecomment-690598521 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2304/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3920: [CARBONDATA-3981] Presto filter check on binary datatype
CarbonDataQA1 commented on pull request #3920: URL: https://github.com/apache/carbondata/pull/3920#issuecomment-690590126 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4042/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
CarbonDataQA1 commented on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-690552025 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2303/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
CarbonDataQA1 commented on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-690537698 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4041/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3911: [CARBONDATA-3793]Fix update and delete issue when multiple partition columns are present and clean files issue
CarbonDataQA1 commented on pull request #3911: URL: https://github.com/apache/carbondata/pull/3911#issuecomment-690502936 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2301/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3911: [CARBONDATA-3793]Fix update and delete issue when multiple partition columns are present and clean files issue
CarbonDataQA1 commented on pull request #3911: URL: https://github.com/apache/carbondata/pull/3911#issuecomment-690500906 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4040/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (CARBONDATA-3981) Presto filter check on binary datatype
[ https://issues.apache.org/jira/browse/CARBONDATA-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshay updated CARBONDATA-3981: --- Description: Due to the absence of binary datatype check, there was a problem during object serialisation in presto filter queries. "select * from table where bin = cast('abc' as varbinary)" threw - error during serialisation. So have added required check in prestoFIlterUtil.java was: Due to the absence of binary datatype check, there was a problem during object serialisation in presto filter queries. "select * from table where bin = cast('abc' as varbinary)" threw - error during serialisation. > Presto filter check on binary datatype > -- > > Key: CARBONDATA-3981 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3981 > Project: CarbonData > Issue Type: Bug > Components: presto-integration >Reporter: Akshay >Priority: Major > > Due to the absence of binary datatype check, there was a problem during > object serialisation in presto filter queries. > "select * from table where bin = cast('abc' as varbinary)" threw - error > during serialisation. > So have added required check in prestoFIlterUtil.java -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] akkio-97 opened a new pull request #3920: [CARBONDATA-3981] Presto filter check on binary datatype
akkio-97 opened a new pull request #3920: URL: https://github.com/apache/carbondata/pull/3920 ### Why is this PR needed? Due to the absence of binary datatype check, there was a problem during object serialisation in presto filter queries. ### What changes were proposed in this PR? Binary datatype check has been added in prestoFIlterUtil.java ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3981) Presto filter check on binary datatype
Akshay created CARBONDATA-3981: -- Summary: Presto filter check on binary datatype Key: CARBONDATA-3981 URL: https://issues.apache.org/jira/browse/CARBONDATA-3981 Project: CarbonData Issue Type: Bug Components: presto-integration Reporter: Akshay Due to the absence of binary datatype check, there was a problem during object serialisation in presto filter queries. "select * from table where bin = cast('abc' as varbinary)" threw - error during serialisation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] ajantha-bhat commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
ajantha-bhat commented on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-690370071 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #3861: [CARBONDATA-3922] Support order by limit push down for secondary index queries
kunal642 commented on pull request #3861: URL: https://github.com/apache/carbondata/pull/3861#issuecomment-690369177 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #2729: [WIP] Carbon Store Size Optimization and Query Performance Improvement
CarbonDataQA1 commented on pull request #2729: URL: https://github.com/apache/carbondata/pull/2729#issuecomment-690352650 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2302/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] asfgit closed pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal
asfgit closed pull request #3902: URL: https://github.com/apache/carbondata/pull/3902 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-3961) Reorder filter according to the column storage ordinal to improve reading
[ https://issues.apache.org/jira/browse/CARBONDATA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat resolved CARBONDATA-3961. -- Fix Version/s: 2.1.0 Resolution: Fixed > Reorder filter according to the column storage ordinal to improve reading > - > > Key: CARBONDATA-3961 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3961 > Project: CarbonData > Issue Type: Improvement >Reporter: Kunal Kapoor >Assignee: Kunal Kapoor >Priority: Major > Fix For: 2.1.0 > > Time Spent: 9h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (CARBONDATA-3969) Fix Deserialization issue with DataType class
[ https://issues.apache.org/jira/browse/CARBONDATA-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat reassigned CARBONDATA-3969: Assignee: Indhumathi Muthumurugesh > Fix Deserialization issue with DataType class > - > > Key: CARBONDATA-3969 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3969 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi Muthumurugesh >Assignee: Indhumathi Muthumurugesh >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3969) Fix Deserialization issue with DataType class
[ https://issues.apache.org/jira/browse/CARBONDATA-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat resolved CARBONDATA-3969. -- Fix Version/s: 2.1.0 Resolution: Fixed > Fix Deserialization issue with DataType class > - > > Key: CARBONDATA-3969 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3969 > Project: CarbonData > Issue Type: Bug >Reporter: Indhumathi Muthumurugesh >Assignee: Indhumathi Muthumurugesh >Priority: Major > Fix For: 2.1.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #3910: [CARBONDATA-3969] Fix Deserialization issue with DataType class
asfgit closed pull request #3910: URL: https://github.com/apache/carbondata/pull/3910 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #3911: [CARBONDATA-3793]Fix update and delete issue when multiple partition columns are present and clean files issue
akashrn5 commented on pull request #3911: URL: https://github.com/apache/carbondata/pull/3911#issuecomment-690346056 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3910: [CARBONDATA-3969] Fix Deserialization issue with DataType class
ajantha-bhat commented on pull request #3910: URL: https://github.com/apache/carbondata/pull/3910#issuecomment-690337464 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal
ajantha-bhat commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-690331643 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal
kunal642 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-690327900 @ajantha-bhat @QiangCai @akashrn5 build passed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3918: [WIP] Use Partition instead of Span to split legacy and non-legacy segments for executor distribution in indexserver
CarbonDataQA1 commented on pull request #3918: URL: https://github.com/apache/carbondata/pull/3918#issuecomment-690300076 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4039/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3918: [WIP] Use Partition instead of Span to split legacy and non-legacy segments for executor distribution in indexserver
CarbonDataQA1 commented on pull request #3918: URL: https://github.com/apache/carbondata/pull/3918#issuecomment-690298089 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2300/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration
CarbonDataQA1 commented on pull request #3913: URL: https://github.com/apache/carbondata/pull/3913#issuecomment-690295508 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2299/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3913: [CARBONDATA-3974] Improve partition purning performance in presto carbon integration
CarbonDataQA1 commented on pull request #3913: URL: https://github.com/apache/carbondata/pull/3913#issuecomment-690292515 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4038/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-3923) Support global sort for Secondary index table
[ https://issues.apache.org/jira/browse/CARBONDATA-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kunal Kapoor resolved CARBONDATA-3923. -- Fix Version/s: 2.1.0 Resolution: Fixed > Support global sort for Secondary index table > - > > Key: CARBONDATA-3923 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3923 > Project: CarbonData > Issue Type: Improvement >Reporter: Ajantha Bhat >Assignee: Ajantha Bhat >Priority: Major > Fix For: 2.1.0 > > Time Spent: 7.5h > Remaining Estimate: 0h > > SI always uses local sort to create the segment. If global sort is used, > filter on SI column can give the faster results. > > So, Support global sort for Secondary index. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #3787: [CARBONDATA-3923] support global sort for SI
asfgit closed pull request #3787: URL: https://github.com/apache/carbondata/pull/3787 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #3787: [CARBONDATA-3923] support global sort for SI
kunal642 commented on pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#issuecomment-690252527 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #3787: [CARBONDATA-3923] support global sort for SI
akashrn5 commented on pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#issuecomment-690245938 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
akashrn5 commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486228364 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala ## @@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with BeforeAndAfterAll { .contains("Alter table drop column operation failed:")) } + test("test create secondary index global sort after insert") { +sql("drop table if exists table1") +sql("create table table1 (name string, id string, country string) stored as carbondata") +sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', '1', 'india'") +sql("create index table1_index on table table1(id, country) as 'carbondata' properties" + +"('sort_scope'='global_sort', 'Global_sort_partitions'='3')") +checkAnswerWithoutSort(sql("select id, country from table1_index"), + Seq(Row("1", "india"), Row("2", "china"))) +// check for valid sort_scope +checkExistence(sql("describe formatted table1_index"), true, "Sort Scope global_sort") +// check the invalid sort scope +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')")) + .getMessage + .contains("Invalid SORT_SCOPE tim_sort")) +// check for invalid global_sort_partitions +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')")) + .getMessage + .contains("Table property global_sort_partitions : -1 is invalid")) +sql("drop index table1_index on table1") Review comment: can just do drop table, it will drop index too, no need to separately run drop index and suggest to give a better tableName and index name and please check other test for same input. ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala ## @@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with BeforeAndAfterAll { .contains("Alter table drop column operation failed:")) } + test("test create secondary index global sort after insert") { +sql("drop table if exists table1") +sql("create table table1 (name string, id string, country string) stored as carbondata") +sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', '1', 'india'") +sql("create index table1_index on table table1(id, country) as 'carbondata' properties" + +"('sort_scope'='global_sort', 'Global_sort_partitions'='3')") +checkAnswerWithoutSort(sql("select id, country from table1_index"), + Seq(Row("1", "india"), Row("2", "china"))) +// check for valid sort_scope +checkExistence(sql("describe formatted table1_index"), true, "Sort Scope global_sort") +// check the invalid sort scope +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')")) + .getMessage + .contains("Invalid SORT_SCOPE tim_sort")) +// check for invalid global_sort_partitions +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')")) + .getMessage + .contains("Table property global_sort_partitions : -1 is invalid")) +sql("drop index table1_index on table1") Review comment: a) not talking about overhead, why to call the command when that will be handled by drop table, why to take effort to call another command, please remove it and same for other test case. b) even though its not an example file, we should always give proper and meaningful names. Just because user uses carbon and see code, we cant give non meaningful names right...!!! ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala ## @@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with BeforeAndAfterAll { .contains("Alter table drop column operation failed:")) } + test("test create secondary index global sort after insert") { +sql("drop table if exists table1") +sql("create table table1 (name string, id string, country string) stored as carbondata") +sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', '1', 'india'") +sql("create index table1_index on table table1(id, country) as 'carbondata' properties" + +
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
ajantha-bhat commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486264018 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala ## @@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with BeforeAndAfterAll { .contains("Alter table drop column operation failed:")) } + test("test create secondary index global sort after insert") { +sql("drop table if exists table1") +sql("create table table1 (name string, id string, country string) stored as carbondata") +sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', '1', 'india'") +sql("create index table1_index on table table1(id, country) as 'carbondata' properties" + +"('sort_scope'='global_sort', 'Global_sort_partitions'='3')") +checkAnswerWithoutSort(sql("select id, country from table1_index"), + Seq(Row("1", "india"), Row("2", "china"))) +// check for valid sort_scope +checkExistence(sql("describe formatted table1_index"), true, "Sort Scope global_sort") +// check the invalid sort scope +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')")) + .getMessage + .contains("Invalid SORT_SCOPE tim_sort")) +// check for invalid global_sort_partitions +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')")) + .getMessage + .contains("Table property global_sort_partitions : -1 is invalid")) +sql("drop index table1_index on table1") Review comment: a) I know, but calling drop index will not add extra overhead. b) For test cases these names are enough ! This is not an example file. ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala ## @@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with BeforeAndAfterAll { .contains("Alter table drop column operation failed:")) } + test("test create secondary index global sort after insert") { +sql("drop table if exists table1") +sql("create table table1 (name string, id string, country string) stored as carbondata") +sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', '1', 'india'") +sql("create index table1_index on table table1(id, country) as 'carbondata' properties" + +"('sort_scope'='global_sort', 'Global_sort_partitions'='3')") +checkAnswerWithoutSort(sql("select id, country from table1_index"), + Seq(Row("1", "india"), Row("2", "china"))) +// check for valid sort_scope +checkExistence(sql("describe formatted table1_index"), true, "Sort Scope global_sort") +// check the invalid sort scope +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')")) + .getMessage + .contains("Invalid SORT_SCOPE tim_sort")) +// check for invalid global_sort_partitions +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')")) + .getMessage + .contains("Table property global_sort_partitions : -1 is invalid")) +sql("drop index table1_index on table1") Review comment: > we cant give non meaningful names right...!!! table1 is a meaningful name to represent it as a table, it is like john wick calling his dog as a dog. On a lighter note, Stop focusing on unimportant things (table1 is used in other 100 places also). As an experienced developer, I do know when a code is not readable. ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala ## @@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with BeforeAndAfterAll { .contains("Alter table drop column operation failed:")) } + test("test create secondary index global sort after insert") { +sql("drop table if exists table1") +sql("create table table1 (name string, id string, country string) stored as carbondata") +sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', '1', 'india'") +sql("create index table1_index on table table1(id, country) as 'carbondata' properties" + +"('sort_scope'='global_sort', 'Global_sort_partitions'='3')") +checkAnswerWithoutSort(sql("select id,
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
ajantha-bhat commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486289292 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala ## @@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with BeforeAndAfterAll { .contains("Alter table drop column operation failed:")) } + test("test create secondary index global sort after insert") { +sql("drop table if exists table1") +sql("create table table1 (name string, id string, country string) stored as carbondata") +sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', '1', 'india'") +sql("create index table1_index on table table1(id, country) as 'carbondata' properties" + +"('sort_scope'='global_sort', 'Global_sort_partitions'='3')") +checkAnswerWithoutSort(sql("select id, country from table1_index"), + Seq(Row("1", "india"), Row("2", "china"))) +// check for valid sort_scope +checkExistence(sql("describe formatted table1_index"), true, "Sort Scope global_sort") +// check the invalid sort scope +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')")) + .getMessage + .contains("Invalid SORT_SCOPE tim_sort")) +// check for invalid global_sort_partitions +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')")) + .getMessage + .contains("Table property global_sort_partitions : -1 is invalid")) +sql("drop index table1_index on table1") Review comment: I can see that from your PR #3608 , you have used t1 as table name and i1 as index name in DropTableTest. is that clean and meaningful name ? I don't want to argue further. Table1 is still a table name I have not named it as car or bike. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
akashrn5 commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486279565 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala ## @@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with BeforeAndAfterAll { .contains("Alter table drop column operation failed:")) } + test("test create secondary index global sort after insert") { +sql("drop table if exists table1") +sql("create table table1 (name string, id string, country string) stored as carbondata") +sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', '1', 'india'") +sql("create index table1_index on table table1(id, country) as 'carbondata' properties" + +"('sort_scope'='global_sort', 'Global_sort_partitions'='3')") +checkAnswerWithoutSort(sql("select id, country from table1_index"), + Seq(Row("1", "india"), Row("2", "china"))) +// check for valid sort_scope +checkExistence(sql("describe formatted table1_index"), true, "Sort Scope global_sort") +// check the invalid sort scope +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')")) + .getMessage + .contains("Invalid SORT_SCOPE tim_sort")) +// check for invalid global_sort_partitions +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')")) + .getMessage + .contains("Table property global_sort_partitions : -1 is invalid")) +sql("drop index table1_index on table1") Review comment: its not about the experience, but i always prefer the code to be very clean and meaningful and reader or developer should be happy reading it. Clean and meaningful names are very important aspect in any code...! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
akashrn5 commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486279565 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala ## @@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with BeforeAndAfterAll { .contains("Alter table drop column operation failed:")) } + test("test create secondary index global sort after insert") { +sql("drop table if exists table1") +sql("create table table1 (name string, id string, country string) stored as carbondata") +sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', '1', 'india'") +sql("create index table1_index on table table1(id, country) as 'carbondata' properties" + +"('sort_scope'='global_sort', 'Global_sort_partitions'='3')") +checkAnswerWithoutSort(sql("select id, country from table1_index"), + Seq(Row("1", "india"), Row("2", "china"))) +// check for valid sort_scope +checkExistence(sql("describe formatted table1_index"), true, "Sort Scope global_sort") +// check the invalid sort scope +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')")) + .getMessage + .contains("Invalid SORT_SCOPE tim_sort")) +// check for invalid global_sort_partitions +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')")) + .getMessage + .contains("Table property global_sort_partitions : -1 is invalid")) +sql("drop index table1_index on table1") Review comment: its not about the experience, but i always prefer the code to be very clean and meaningful and reader or developer should be happy reading about it and clean and meaningful names are very important aspect in any code...! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-690218874 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4035/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3787: [CARBONDATA-3923] support global sort for SI
CarbonDataQA1 commented on pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#issuecomment-690218157 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2297/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-690217853 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2296/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
ajantha-bhat commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486276366 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala ## @@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with BeforeAndAfterAll { .contains("Alter table drop column operation failed:")) } + test("test create secondary index global sort after insert") { +sql("drop table if exists table1") +sql("create table table1 (name string, id string, country string) stored as carbondata") +sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', '1', 'india'") +sql("create index table1_index on table table1(id, country) as 'carbondata' properties" + +"('sort_scope'='global_sort', 'Global_sort_partitions'='3')") +checkAnswerWithoutSort(sql("select id, country from table1_index"), + Seq(Row("1", "india"), Row("2", "china"))) +// check for valid sort_scope +checkExistence(sql("describe formatted table1_index"), true, "Sort Scope global_sort") +// check the invalid sort scope +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')")) + .getMessage + .contains("Invalid SORT_SCOPE tim_sort")) +// check for invalid global_sort_partitions +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')")) + .getMessage + .contains("Table property global_sort_partitions : -1 is invalid")) +sql("drop index table1_index on table1") Review comment: > we cant give non meaningful names right...!!! table1 is a meaningful name to represent it as a table, it is like john wick calling his dog as a dog. On a lighter note, Stop focusing on unimportant things (table1 is used in other 100 places also). As an experienced developer, I do know when a code is not readable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
ajantha-bhat commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486276366 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala ## @@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with BeforeAndAfterAll { .contains("Alter table drop column operation failed:")) } + test("test create secondary index global sort after insert") { +sql("drop table if exists table1") +sql("create table table1 (name string, id string, country string) stored as carbondata") +sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', '1', 'india'") +sql("create index table1_index on table table1(id, country) as 'carbondata' properties" + +"('sort_scope'='global_sort', 'Global_sort_partitions'='3')") +checkAnswerWithoutSort(sql("select id, country from table1_index"), + Seq(Row("1", "india"), Row("2", "china"))) +// check for valid sort_scope +checkExistence(sql("describe formatted table1_index"), true, "Sort Scope global_sort") +// check the invalid sort scope +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')")) + .getMessage + .contains("Invalid SORT_SCOPE tim_sort")) +// check for invalid global_sort_partitions +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')")) + .getMessage + .contains("Table property global_sort_partitions : -1 is invalid")) +sql("drop index table1_index on table1") Review comment: > we cant give non meaningful names right...!!! table1 is a meaningful name to represent it as a table, it is like john wick calling his dog as a dog. On a lighter note, Stop focusing on unimportant things. As an experienced developer, I do know when a code is not readable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3787: [CARBONDATA-3923] support global sort for SI
CarbonDataQA1 commented on pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#issuecomment-690216620 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4036/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
akashrn5 commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486269619 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala ## @@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with BeforeAndAfterAll { .contains("Alter table drop column operation failed:")) } + test("test create secondary index global sort after insert") { +sql("drop table if exists table1") +sql("create table table1 (name string, id string, country string) stored as carbondata") +sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', '1', 'india'") +sql("create index table1_index on table table1(id, country) as 'carbondata' properties" + +"('sort_scope'='global_sort', 'Global_sort_partitions'='3')") +checkAnswerWithoutSort(sql("select id, country from table1_index"), + Seq(Row("1", "india"), Row("2", "china"))) +// check for valid sort_scope +checkExistence(sql("describe formatted table1_index"), true, "Sort Scope global_sort") +// check the invalid sort scope +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')")) + .getMessage + .contains("Invalid SORT_SCOPE tim_sort")) +// check for invalid global_sort_partitions +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')")) + .getMessage + .contains("Table property global_sort_partitions : -1 is invalid")) +sql("drop index table1_index on table1") Review comment: a) not talking about overhead, why to call the command when that will be handled by drop table, why to take effort to call another command, please remove it and same for other test case. b) even though its not an example file, we should always give proper and meaningful names. Just because user uses carbon and see code, we cant give non meaningful names right...!!! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3918: [WIP] Use Partition instead of Span to split legacy and non-legacy segments for executor distribution in indexserver
CarbonDataQA1 commented on pull request #3918: URL: https://github.com/apache/carbondata/pull/3918#issuecomment-690200478 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4034/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3919: [CARBONDATA-3980] Load fails with aborted exception when Bad records action is unspecified
CarbonDataQA1 commented on pull request #3919: URL: https://github.com/apache/carbondata/pull/3919#issuecomment-690199098 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4037/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
ajantha-bhat commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486266977 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala ## @@ -89,7 +104,7 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy { try { pruneFilterProject( l, - projects.filterNot(_.name.equalsIgnoreCase(CarbonCommonConstants.POSITION_ID)), Review comment: Because these position id should not be removed always, it has to be removed only in certain conditions (example, when `isPositionIDRequested` property is not set) Inside this `getRequestedColumns` will take care of removing it based on conditions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3919: [CARBONDATA-3980] Load fails with aborted exception when Bad records action is unspecified
CarbonDataQA1 commented on pull request #3919: URL: https://github.com/apache/carbondata/pull/3919#issuecomment-690196778 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2298/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3918: [WIP] Use Partition instead of Span to split legacy and non-legacy segments for executor distribution in indexserver
CarbonDataQA1 commented on pull request #3918: URL: https://github.com/apache/carbondata/pull/3918#issuecomment-690196116 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2295/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
ajantha-bhat commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486264018 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala ## @@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with BeforeAndAfterAll { .contains("Alter table drop column operation failed:")) } + test("test create secondary index global sort after insert") { +sql("drop table if exists table1") +sql("create table table1 (name string, id string, country string) stored as carbondata") +sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', '1', 'india'") +sql("create index table1_index on table table1(id, country) as 'carbondata' properties" + +"('sort_scope'='global_sort', 'Global_sort_partitions'='3')") +checkAnswerWithoutSort(sql("select id, country from table1_index"), + Seq(Row("1", "india"), Row("2", "china"))) +// check for valid sort_scope +checkExistence(sql("describe formatted table1_index"), true, "Sort Scope global_sort") +// check the invalid sort scope +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')")) + .getMessage + .contains("Invalid SORT_SCOPE tim_sort")) +// check for invalid global_sort_partitions +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')")) + .getMessage + .contains("Table property global_sort_partitions : -1 is invalid")) +sql("drop index table1_index on table1") Review comment: a) I know, but calling drop index will not add extra overhead. b) For test cases these names are enough ! This is not an example file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
ajantha-bhat commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486261767 ## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ## @@ -801,6 +802,26 @@ object CommonUtil { } } + def validateGlobalSortPartitions(propertiesMap: mutable.Map[String, String]): Unit = { +if (propertiesMap.get("global_sort_partitions").isDefined) { + val globalSortPartitionsProp = propertiesMap("global_sort_partitions") + var pass = false + try { +val globalSortPartitions = Integer.parseInt(globalSortPartitionsProp) +if (globalSortPartitions > 0) { + pass = true +} + } catch { +case _ => + } + if (!pass) { Review comment: If there is only one condition I would have done that only, I need to check `globalSortPartitions > 0` also and throw same error. Hence handing error at once place This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (CARBONDATA-3980) Load fails with aborted exception when Bad records action is unspecified
[ https://issues.apache.org/jira/browse/CARBONDATA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SHREELEKHYA GAMPA updated CARBONDATA-3980: -- Description: When the partition column is loaded with a bad record value, load fails with 'Job aborted' message in cluster. However in complete stack trace we can see the actual error message. ('Data load failed due to bad record: The value with column name projectjoindate and column data type TIMESTAMP is not a valid TIMESTAMP type') Bug id: BUG2020082802430 PR link: https://github.com/apache/carbondata/pull/3919 was: When the partition column is loaded with a bad record value, load fails with 'Job aborted' message in cluster. However in complete stack trace we can see the actual error message. ('Data load failed due to bad record: The value with column name projectjoindate and column data type TIMESTAMP is not a valid TIMESTAMP type') Bug id: BUG2020082802430 Remaining Estimate: (was: 0h) > Load fails with aborted exception when Bad records action is unspecified > > > Key: CARBONDATA-3980 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3980 > Project: CarbonData > Issue Type: Bug >Reporter: SHREELEKHYA GAMPA >Priority: Minor > Time Spent: 10m > > When the partition column is loaded with a bad record value, load fails with > 'Job aborted' message in cluster. However in complete stack trace we can see > the actual error message. ('Data load failed due to bad record: The value > with column name projectjoindate and column data type TIMESTAMP is not a > valid TIMESTAMP type') > Bug id: BUG2020082802430 > PR link: https://github.com/apache/carbondata/pull/3919 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] ShreelekhyaG opened a new pull request #3919: [CARBONDATA-3980] Load fails with aborted exception when Bad records action is unspecified
ShreelekhyaG opened a new pull request #3919: URL: https://github.com/apache/carbondata/pull/3919 ### Why is this PR needed? Load fails with aborted exception when Bad records action is unspecified. When the partition column is loaded with a bad record value, load fails with 'Job aborted' message in cluster. However in complete stack trace we can see the actual error message. (Like, 'Data load failed due to bad record: The value with column name projectjoindate and column data type TIMESTAMP is not a valid TIMESTAMP type') ### What changes were proposed in this PR? Fix bad record error message for the partition column. Added the error message to `operationContext` map and if its not null throwing exception with `errorMessage` from `CarbonLoadDataCommand`. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - No, tested in cluster. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3980) Load fails with aborted exception when Bad records action is unspecified
SHREELEKHYA GAMPA created CARBONDATA-3980: - Summary: Load fails with aborted exception when Bad records action is unspecified Key: CARBONDATA-3980 URL: https://issues.apache.org/jira/browse/CARBONDATA-3980 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA When the partition column is loaded with a bad record value, load fails with 'Job aborted' message in cluster. However in complete stack trace we can see the actual error message. ('Data load failed due to bad record: The value with column name projectjoindate and column data type TIMESTAMP is not a valid TIMESTAMP type') Bug id: BUG2020082802430 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] kunal642 commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
kunal642 commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486246357 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala ## @@ -89,7 +104,7 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy { try { pruneFilterProject( l, - projects.filterNot(_.name.equalsIgnoreCase(CarbonCommonConstants.POSITION_ID)), Review comment: why is the filter removed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
kunal642 commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486243458 ## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ## @@ -801,6 +802,26 @@ object CommonUtil { } } + def validateGlobalSortPartitions(propertiesMap: mutable.Map[String, String]): Unit = { +if (propertiesMap.get("global_sort_partitions").isDefined) { + val globalSortPartitionsProp = propertiesMap("global_sort_partitions") + var pass = false + try { +val globalSortPartitions = Integer.parseInt(globalSortPartitionsProp) +if (globalSortPartitions > 0) { + pass = true +} + } catch { +case _ => + } + if (!pass) { Review comment: no, keeping this variable doesn't make sense. please catch Parsing Exception an throw MalformedCarbonCommandException direclty This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
akashrn5 commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486228364 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithSecondryIndex.scala ## @@ -86,6 +86,64 @@ class TestSIWithSecondryIndex extends QueryTest with BeforeAndAfterAll { .contains("Alter table drop column operation failed:")) } + test("test create secondary index global sort after insert") { +sql("drop table if exists table1") +sql("create table table1 (name string, id string, country string) stored as carbondata") +sql("insert into table1 select 'xx', '2', 'china' union all select 'xx', '1', 'india'") +sql("create index table1_index on table table1(id, country) as 'carbondata' properties" + +"('sort_scope'='global_sort', 'Global_sort_partitions'='3')") +checkAnswerWithoutSort(sql("select id, country from table1_index"), + Seq(Row("1", "india"), Row("2", "china"))) +// check for valid sort_scope +checkExistence(sql("describe formatted table1_index"), true, "Sort Scope global_sort") +// check the invalid sort scope +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='tim_sort', 'Global_sort_partitions'='3')")) + .getMessage + .contains("Invalid SORT_SCOPE tim_sort")) +// check for invalid global_sort_partitions +assert(intercept[MalformedCarbonCommandException](sql( + "create index index_2 on table table1(id, country) as 'carbondata' properties" + + "('sort_scope'='global_sort', 'Global_sort_partitions'='-1')")) + .getMessage + .contains("Table property global_sort_partitions : -1 is invalid")) +sql("drop index table1_index on table1") Review comment: can just do drop table, it will drop index too, no need to separately run drop index and suggest to give a better tableName and index name and please check other test for same input. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3787: [CARBONDATA-3923] support global sort for SI
ajantha-bhat commented on pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#issuecomment-690129651 @akashrn5 : handled comments, please check and merge once build passes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
ajantha-bhat commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486218886 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala ## @@ -428,4 +552,40 @@ object SecondaryIndexCreator { } threadPoolSize } + + def dataFrameOfSegments( + sparkSession: SparkSession, + carbonTable: CarbonTable, + projections: String, + segments: Array[String]): DataFrame = { +try { + CarbonUtils +.threadSet(CarbonCommonConstants.CARBON_INPUT_SEGMENTS + + carbonTable.getDatabaseName + CarbonCommonConstants.POINT + + carbonTable.getTableName, + segments.mkString(",")) Review comment: Moved. These are created by reformat command itself (ctrl + alt + shift + L), so need the correct tool to properly reformat or not use it. ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala ## @@ -152,68 +158,181 @@ object SecondaryIndexCreator { LOGGER.info("spark.dynamicAllocation.maxExecutors property is set to =" + execInstance) } } - var futureObjectList = List[java.util.concurrent.Future[Array[(String, Boolean)]]]() - for (eachSegment <- validSegmentList) { -val segId = eachSegment -futureObjectList :+= executorService.submit(new Callable[Array[(String, Boolean)]] { - @throws(classOf[Exception]) - override def call(): Array[(String, Boolean)] = { - ThreadLocalSessionInfo.getOrCreateCarbonSessionInfo().getNonSerializableExtraInfo - .put("carbonConf", SparkSQLUtil.sessionState(sc.sparkSession).newHadoopConf()) -var eachSegmentSecondaryIndexCreationStatus: Array[(String, Boolean)] = Array.empty -CarbonLoaderUtil.checkAndCreateCarbonDataLocation(segId, indexCarbonTable) -val carbonLoadModel = getCopyObject(secondaryIndexModel) -carbonLoadModel - .setFactTimeStamp(secondaryIndexModel.segmentIdToLoadStartTimeMapping(eachSegment)) - carbonLoadModel.setTablePath(secondaryIndexModel.carbonTable.getTablePath) -val secondaryIndexCreationStatus = new CarbonSecondaryIndexRDD(sc.sparkSession, - new SecondaryIndexCreationResultImpl, - carbonLoadModel, - secondaryIndexModel.secondaryIndex, - segId, execInstance, indexCarbonTable, forceAccessSegment, isCompactionCall).collect() + var successSISegments: List[String] = List() + var failedSISegments: List[String] = List() + val sort_scope = indexCarbonTable.getTableInfo.getFactTable.getTableProperties +.get("sort_scope") + if (sort_scope != null && sort_scope.equalsIgnoreCase("global_sort")) { +val mainTable = secondaryIndexModel.carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable +var futureObjectList = List[java.util.concurrent.Future[Array[(String, + (LoadMetadataDetails, ExecutionErrors))]]]() +for (eachSegment <- validSegmentList) { + futureObjectList :+= executorService +.submit(new Callable[Array[(String, (LoadMetadataDetails, ExecutionErrors))]] { + @throws(classOf[Exception]) + override def call(): Array[(String, (LoadMetadataDetails, ExecutionErrors))] = { +val carbonLoadModel = getCopyObject(secondaryIndexModel) +// loading, we need to query main table add position reference +val proj = indexCarbonTable.getCreateOrderColumn + .asScala + .map(_.getColName) + .filterNot(_.equals("positionReference")).toSet Review comment: done ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala ## @@ -152,68 +158,181 @@ object SecondaryIndexCreator { LOGGER.info("spark.dynamicAllocation.maxExecutors property is set to =" + execInstance) } } - var futureObjectList = List[java.util.concurrent.Future[Array[(String, Boolean)]]]() - for (eachSegment <- validSegmentList) { -val segId = eachSegment -futureObjectList :+= executorService.submit(new Callable[Array[(String, Boolean)]] { - @throws(classOf[Exception]) - override def call(): Array[(String, Boolean)] = { - ThreadLocalSessionInfo.getOrCreateCarbonSessionInfo().getNonSerializableExtraInfo - .put("carbonConf", SparkSQLUtil.sessionState(sc.sparkSession).newHadoopConf()) -var eachSegmentSecondaryIndexCreationStatus: Array[(String, Boolean)] = Array.empty -CarbonLoaderUtil.checkAndCreateCarbonDataLocation(segId, indexCarbonTable) -val carbonLoadModel =
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
ajantha-bhat commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486218662 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala ## @@ -297,6 +298,10 @@ object CarbonIndexUtil { segmentIdToLoadStartTimeMapping = scala.collection.mutable .Map((carbonLoadModel.getSegmentId, carbonLoadModel.getFactTimeStamp)) } +val indexCarbonTable = CarbonEnv.getCarbonTable( Review comment: ok.done ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala ## @@ -152,68 +158,181 @@ object SecondaryIndexCreator { LOGGER.info("spark.dynamicAllocation.maxExecutors property is set to =" + execInstance) } } - var futureObjectList = List[java.util.concurrent.Future[Array[(String, Boolean)]]]() - for (eachSegment <- validSegmentList) { -val segId = eachSegment -futureObjectList :+= executorService.submit(new Callable[Array[(String, Boolean)]] { - @throws(classOf[Exception]) - override def call(): Array[(String, Boolean)] = { - ThreadLocalSessionInfo.getOrCreateCarbonSessionInfo().getNonSerializableExtraInfo - .put("carbonConf", SparkSQLUtil.sessionState(sc.sparkSession).newHadoopConf()) -var eachSegmentSecondaryIndexCreationStatus: Array[(String, Boolean)] = Array.empty -CarbonLoaderUtil.checkAndCreateCarbonDataLocation(segId, indexCarbonTable) -val carbonLoadModel = getCopyObject(secondaryIndexModel) -carbonLoadModel - .setFactTimeStamp(secondaryIndexModel.segmentIdToLoadStartTimeMapping(eachSegment)) - carbonLoadModel.setTablePath(secondaryIndexModel.carbonTable.getTablePath) -val secondaryIndexCreationStatus = new CarbonSecondaryIndexRDD(sc.sparkSession, - new SecondaryIndexCreationResultImpl, - carbonLoadModel, - secondaryIndexModel.secondaryIndex, - segId, execInstance, indexCarbonTable, forceAccessSegment, isCompactionCall).collect() + var successSISegments: List[String] = List() + var failedSISegments: List[String] = List() + val sort_scope = indexCarbonTable.getTableInfo.getFactTable.getTableProperties +.get("sort_scope") + if (sort_scope != null && sort_scope.equalsIgnoreCase("global_sort")) { +val mainTable = secondaryIndexModel.carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable +var futureObjectList = List[java.util.concurrent.Future[Array[(String, + (LoadMetadataDetails, ExecutionErrors))]]]() +for (eachSegment <- validSegmentList) { + futureObjectList :+= executorService +.submit(new Callable[Array[(String, (LoadMetadataDetails, ExecutionErrors))]] { + @throws(classOf[Exception]) + override def call(): Array[(String, (LoadMetadataDetails, ExecutionErrors))] = { +val carbonLoadModel = getCopyObject(secondaryIndexModel) +// loading, we need to query main table add position reference +val proj = indexCarbonTable.getCreateOrderColumn Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
ajantha-bhat commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486219230 ## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ## @@ -801,6 +802,26 @@ object CommonUtil { } } + def validateGlobalSortPartitions(propertiesMap: mutable.Map[String, String]): Unit = { +if (propertiesMap.get("global_sort_partitions").isDefined) { Review comment: done ## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ## @@ -801,6 +802,26 @@ object CommonUtil { } } + def validateGlobalSortPartitions(propertiesMap: mutable.Map[String, String]): Unit = { +if (propertiesMap.get("global_sort_partitions").isDefined) { + val globalSortPartitionsProp = propertiesMap("global_sort_partitions") + var pass = false + try { +val globalSortPartitions = Integer.parseInt(globalSortPartitionsProp) Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
ajantha-bhat commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486219088 ## File path: docs/index/secondary-index-guide.md ## @@ -84,7 +84,8 @@ EXPLAIN SELECT a from maintable where c = 'cd'; 'carbondata' PROPERTIES('table_blocksize'='1') ``` - + **NOTE**: + * supported properties are table_blocksize, column_meta_cache, cache_level, carbon.column.compressor, sort_scope, global_sort_partitions Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
ajantha-bhat commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486218379 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala ## @@ -152,68 +158,181 @@ object SecondaryIndexCreator { LOGGER.info("spark.dynamicAllocation.maxExecutors property is set to =" + execInstance) } } - var futureObjectList = List[java.util.concurrent.Future[Array[(String, Boolean)]]]() - for (eachSegment <- validSegmentList) { -val segId = eachSegment -futureObjectList :+= executorService.submit(new Callable[Array[(String, Boolean)]] { - @throws(classOf[Exception]) - override def call(): Array[(String, Boolean)] = { - ThreadLocalSessionInfo.getOrCreateCarbonSessionInfo().getNonSerializableExtraInfo - .put("carbonConf", SparkSQLUtil.sessionState(sc.sparkSession).newHadoopConf()) -var eachSegmentSecondaryIndexCreationStatus: Array[(String, Boolean)] = Array.empty -CarbonLoaderUtil.checkAndCreateCarbonDataLocation(segId, indexCarbonTable) -val carbonLoadModel = getCopyObject(secondaryIndexModel) -carbonLoadModel - .setFactTimeStamp(secondaryIndexModel.segmentIdToLoadStartTimeMapping(eachSegment)) - carbonLoadModel.setTablePath(secondaryIndexModel.carbonTable.getTablePath) -val secondaryIndexCreationStatus = new CarbonSecondaryIndexRDD(sc.sparkSession, - new SecondaryIndexCreationResultImpl, - carbonLoadModel, - secondaryIndexModel.secondaryIndex, - segId, execInstance, indexCarbonTable, forceAccessSegment, isCompactionCall).collect() + var successSISegments: List[String] = List() + var failedSISegments: List[String] = List() + val sort_scope = indexCarbonTable.getTableInfo.getFactTable.getTableProperties +.get("sort_scope") + if (sort_scope != null && sort_scope.equalsIgnoreCase("global_sort")) { +val mainTable = secondaryIndexModel.carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable +var futureObjectList = List[java.util.concurrent.Future[Array[(String, + (LoadMetadataDetails, ExecutionErrors))]]]() +for (eachSegment <- validSegmentList) { + futureObjectList :+= executorService +.submit(new Callable[Array[(String, (LoadMetadataDetails, ExecutionErrors))]] { + @throws(classOf[Exception]) + override def call(): Array[(String, (LoadMetadataDetails, ExecutionErrors))] = { +val carbonLoadModel = getCopyObject(secondaryIndexModel) +// loading, we need to query main table add position reference +val proj = indexCarbonTable.getCreateOrderColumn + .asScala + .map(_.getColName) + .filterNot(_.equals("positionReference")).toSet +val explodeColumn = mainTable.getCreateOrderColumn.asScala + .filter(x => x.getDataType.isComplexType && + proj.contains(x.getColName)) +var dataFrame = dataFrameOfSegments(sc.sparkSession, + mainTable, + proj.mkString(","), + Array(eachSegment)) +// flatten the complex SI +if (explodeColumn.nonEmpty) { + val columns = dataFrame.schema.map { x => +if (x.name.equals(explodeColumn.head.getColName)) { + functions.explode_outer(functions.col(x.name)) +} else { + functions.col(x.name) +} + } + dataFrame = dataFrame.select(columns: _*) +} +val dataLoadSchema = new CarbonDataLoadSchema(indexCarbonTable) +carbonLoadModel.setCarbonDataLoadSchema(dataLoadSchema) +carbonLoadModel.setTableName(indexCarbonTable.getTableName) + carbonLoadModel.setDatabaseName(indexCarbonTable.getDatabaseName) +carbonLoadModel.setTablePath(indexCarbonTable.getTablePath) +carbonLoadModel.setFactTimeStamp(secondaryIndexModel + .segmentIdToLoadStartTimeMapping(eachSegment)) +carbonLoadModel.setSegmentId(eachSegment) +var result: Array[(String, (LoadMetadataDetails, ExecutionErrors))] = null +try { + val configuration = FileFactory.getConfiguration + configuration.set(CarbonTableInputFormat.INPUT_SEGMENT_NUMBERS, eachSegment) + def findCarbonScanRDD(rdd: RDD[_]): Unit = { +rdd match { + case d: CarbonScanRDD[_] => Review comment: done ## File path:
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
ajantha-bhat commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486218311 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala ## @@ -152,68 +158,181 @@ object SecondaryIndexCreator { LOGGER.info("spark.dynamicAllocation.maxExecutors property is set to =" + execInstance) } } - var futureObjectList = List[java.util.concurrent.Future[Array[(String, Boolean)]]]() - for (eachSegment <- validSegmentList) { -val segId = eachSegment -futureObjectList :+= executorService.submit(new Callable[Array[(String, Boolean)]] { - @throws(classOf[Exception]) - override def call(): Array[(String, Boolean)] = { - ThreadLocalSessionInfo.getOrCreateCarbonSessionInfo().getNonSerializableExtraInfo - .put("carbonConf", SparkSQLUtil.sessionState(sc.sparkSession).newHadoopConf()) -var eachSegmentSecondaryIndexCreationStatus: Array[(String, Boolean)] = Array.empty -CarbonLoaderUtil.checkAndCreateCarbonDataLocation(segId, indexCarbonTable) -val carbonLoadModel = getCopyObject(secondaryIndexModel) -carbonLoadModel - .setFactTimeStamp(secondaryIndexModel.segmentIdToLoadStartTimeMapping(eachSegment)) - carbonLoadModel.setTablePath(secondaryIndexModel.carbonTable.getTablePath) -val secondaryIndexCreationStatus = new CarbonSecondaryIndexRDD(sc.sparkSession, - new SecondaryIndexCreationResultImpl, - carbonLoadModel, - secondaryIndexModel.secondaryIndex, - segId, execInstance, indexCarbonTable, forceAccessSegment, isCompactionCall).collect() + var successSISegments: List[String] = List() + var failedSISegments: List[String] = List() + val sort_scope = indexCarbonTable.getTableInfo.getFactTable.getTableProperties +.get("sort_scope") + if (sort_scope != null && sort_scope.equalsIgnoreCase("global_sort")) { +val mainTable = secondaryIndexModel.carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable +var futureObjectList = List[java.util.concurrent.Future[Array[(String, + (LoadMetadataDetails, ExecutionErrors))]]]() +for (eachSegment <- validSegmentList) { + futureObjectList :+= executorService +.submit(new Callable[Array[(String, (LoadMetadataDetails, ExecutionErrors))]] { + @throws(classOf[Exception]) + override def call(): Array[(String, (LoadMetadataDetails, ExecutionErrors))] = { +val carbonLoadModel = getCopyObject(secondaryIndexModel) +// loading, we need to query main table add position reference +val proj = indexCarbonTable.getCreateOrderColumn + .asScala + .map(_.getColName) + .filterNot(_.equals("positionReference")).toSet +val explodeColumn = mainTable.getCreateOrderColumn.asScala + .filter(x => x.getDataType.isComplexType && + proj.contains(x.getColName)) +var dataFrame = dataFrameOfSegments(sc.sparkSession, + mainTable, + proj.mkString(","), + Array(eachSegment)) +// flatten the complex SI +if (explodeColumn.nonEmpty) { + val columns = dataFrame.schema.map { x => +if (x.name.equals(explodeColumn.head.getColName)) { + functions.explode_outer(functions.col(x.name)) +} else { + functions.col(x.name) +} + } + dataFrame = dataFrame.select(columns: _*) +} +val dataLoadSchema = new CarbonDataLoadSchema(indexCarbonTable) +carbonLoadModel.setCarbonDataLoadSchema(dataLoadSchema) +carbonLoadModel.setTableName(indexCarbonTable.getTableName) + carbonLoadModel.setDatabaseName(indexCarbonTable.getDatabaseName) +carbonLoadModel.setTablePath(indexCarbonTable.getTablePath) +carbonLoadModel.setFactTimeStamp(secondaryIndexModel + .segmentIdToLoadStartTimeMapping(eachSegment)) +carbonLoadModel.setSegmentId(eachSegment) +var result: Array[(String, (LoadMetadataDetails, ExecutionErrors))] = null +try { + val configuration = FileFactory.getConfiguration + configuration.set(CarbonTableInputFormat.INPUT_SEGMENT_NUMBERS, eachSegment) + def findCarbonScanRDD(rdd: RDD[_]): Unit = { +rdd match { + case d: CarbonScanRDD[_] => +d.setValidateSegmentToAccess(false) +
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
ajantha-bhat commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486217971 ## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala ## @@ -95,6 +95,8 @@ class CarbonScanRDD[T: ClassTag]( private var readCommittedScope: ReadCommittedScope = _ + private var validateSegmentToAccess: Boolean = true Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3979) Added Hive local dictionary support example
SHREELEKHYA GAMPA created CARBONDATA-3979: - Summary: Added Hive local dictionary support example Key: CARBONDATA-3979 URL: https://issues.apache.org/jira/browse/CARBONDATA-3979 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA To verify local dictionary support in hive for the carbon tables created from spark. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] kunal642 commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal
kunal642 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-690124448 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning
kunal642 commented on pull request #3908: URL: https://github.com/apache/carbondata/pull/3908#issuecomment-690123684 @QiangCai build passed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
ajantha-bhat commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486204559 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala ## @@ -428,4 +552,40 @@ object SecondaryIndexCreator { } threadPoolSize } + + def dataFrameOfSegments( + sparkSession: SparkSession, + carbonTable: CarbonTable, + projections: String, + segments: Array[String]): DataFrame = { +try { + CarbonUtils +.threadSet(CarbonCommonConstants.CARBON_INPUT_SEGMENTS + + carbonTable.getDatabaseName + CarbonCommonConstants.POINT + + carbonTable.getTableName, + segments.mkString(",")) + val logicalPlan = sparkSession +.sql(s"select $projections from ${ carbonTable.getDatabaseName }.${ + carbonTable +.getTableName +}") Review comment: Moved. These are created by reformat command itself (ctrl + alt + shift + L), so need the correct tool to properly reformat or not use it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 opened a new pull request #3918: [WIP] Use Partition instead of Span to split legacy and non-legacy segments for executor distribution in indexserver
Indhumathi27 opened a new pull request #3918: URL: https://github.com/apache/carbondata/pull/3918 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
ajantha-bhat commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486196569 ## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ## @@ -801,6 +802,26 @@ object CommonUtil { } } + def validateGlobalSortPartitions(propertiesMap: mutable.Map[String, String]): Unit = { +if (propertiesMap.get("global_sort_partitions").isDefined) { + val globalSortPartitionsProp = propertiesMap("global_sort_partitions") + var pass = false + try { +val globalSortPartitions = Integer.parseInt(globalSortPartitionsProp) +if (globalSortPartitions > 0) { + pass = true +} + } catch { +case _ => + } + if (!pass) { Review comment: keeping a flag and handling error at one place is better I feel This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3787: [CARBONDATA-3923] support global sort for SI
CarbonDataQA1 commented on pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#issuecomment-690105564 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4033/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3787: [CARBONDATA-3923] support global sort for SI
CarbonDataQA1 commented on pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#issuecomment-690103617 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2294/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #3787: [CARBONDATA-3923] support global sort for SI
akashrn5 commented on a change in pull request #3787: URL: https://github.com/apache/carbondata/pull/3787#discussion_r486083368 ## File path: integration/spark/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ## @@ -801,6 +802,26 @@ object CommonUtil { } } + def validateGlobalSortPartitions(propertiesMap: mutable.Map[String, String]): Unit = { +if (propertiesMap.get("global_sort_partitions").isDefined) { Review comment: replace with `propertiesMap.contains("global_sort_partitions")` ## File path: docs/index/secondary-index-guide.md ## @@ -84,7 +84,8 @@ EXPLAIN SELECT a from maintable where c = 'cd'; 'carbondata' PROPERTIES('table_blocksize'='1') ``` - + **NOTE**: + * supported properties are table_blocksize, column_meta_cache, cache_level, carbon.column.compressor, sort_scope, global_sort_partitions Review comment: ```suggestion * supported properties are table_blocksize, column_meta_cache, cache_level, carbon.column.compressor, sort_scope and global_sort_partitions. ``` ## File path: integration/spark/src/main/scala/org/apache/spark/sql/index/CarbonIndexUtil.scala ## @@ -297,6 +298,10 @@ object CarbonIndexUtil { segmentIdToLoadStartTimeMapping = scala.collection.mutable .Map((carbonLoadModel.getSegmentId, carbonLoadModel.getFactTimeStamp)) } +val indexCarbonTable = CarbonEnv.getCarbonTable( Review comment: index table object is already present, please remove this ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala ## @@ -152,68 +158,181 @@ object SecondaryIndexCreator { LOGGER.info("spark.dynamicAllocation.maxExecutors property is set to =" + execInstance) } } - var futureObjectList = List[java.util.concurrent.Future[Array[(String, Boolean)]]]() - for (eachSegment <- validSegmentList) { -val segId = eachSegment -futureObjectList :+= executorService.submit(new Callable[Array[(String, Boolean)]] { - @throws(classOf[Exception]) - override def call(): Array[(String, Boolean)] = { - ThreadLocalSessionInfo.getOrCreateCarbonSessionInfo().getNonSerializableExtraInfo - .put("carbonConf", SparkSQLUtil.sessionState(sc.sparkSession).newHadoopConf()) -var eachSegmentSecondaryIndexCreationStatus: Array[(String, Boolean)] = Array.empty -CarbonLoaderUtil.checkAndCreateCarbonDataLocation(segId, indexCarbonTable) -val carbonLoadModel = getCopyObject(secondaryIndexModel) -carbonLoadModel - .setFactTimeStamp(secondaryIndexModel.segmentIdToLoadStartTimeMapping(eachSegment)) - carbonLoadModel.setTablePath(secondaryIndexModel.carbonTable.getTablePath) -val secondaryIndexCreationStatus = new CarbonSecondaryIndexRDD(sc.sparkSession, - new SecondaryIndexCreationResultImpl, - carbonLoadModel, - secondaryIndexModel.secondaryIndex, - segId, execInstance, indexCarbonTable, forceAccessSegment, isCompactionCall).collect() + var successSISegments: List[String] = List() + var failedSISegments: List[String] = List() + val sort_scope = indexCarbonTable.getTableInfo.getFactTable.getTableProperties +.get("sort_scope") + if (sort_scope != null && sort_scope.equalsIgnoreCase("global_sort")) { +val mainTable = secondaryIndexModel.carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable +var futureObjectList = List[java.util.concurrent.Future[Array[(String, + (LoadMetadataDetails, ExecutionErrors))]]]() +for (eachSegment <- validSegmentList) { + futureObjectList :+= executorService +.submit(new Callable[Array[(String, (LoadMetadataDetails, ExecutionErrors))]] { + @throws(classOf[Exception]) + override def call(): Array[(String, (LoadMetadataDetails, ExecutionErrors))] = { +val carbonLoadModel = getCopyObject(secondaryIndexModel) +// loading, we need to query main table add position reference +val proj = indexCarbonTable.getCreateOrderColumn Review comment: rename to projections ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/SecondaryIndexCreator.scala ## @@ -428,4 +552,40 @@ object SecondaryIndexCreator { } threadPoolSize } + + def dataFrameOfSegments( + sparkSession: SparkSession, + carbonTable: CarbonTable, + projections: String, + segments: Array[String]): DataFrame = { +try { + CarbonUtils +.threadSet(CarbonCommonConstants.CARBON_INPUT_SEGMENTS + + carbonTable.getDatabaseName + CarbonCommonConstants.POINT + + carbonTable.getTableName, +
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3917: [CARBONDATA-3978] Clean files refactor and added support for a trash folder where all the carbondata files will be copied to after
CarbonDataQA1 commented on pull request #3917: URL: https://github.com/apache/carbondata/pull/3917#issuecomment-690060524 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2293/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3917: [CARBONDATA-3978] Clean files refactor and added support for a trash folder where all the carbondata files will be copied to after
CarbonDataQA1 commented on pull request #3917: URL: https://github.com/apache/carbondata/pull/3917#issuecomment-690056910 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4032/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-690030669 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2291/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning
CarbonDataQA1 commented on pull request #3908: URL: https://github.com/apache/carbondata/pull/3908#issuecomment-690026266 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4029/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3902: [CARBONDATA-3961] reorder filter expression based on storage ordinal
CarbonDataQA1 commented on pull request #3902: URL: https://github.com/apache/carbondata/pull/3902#issuecomment-690025455 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4030/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3908: [CARBONDATA-3967] cache partition on select to enable faster pruning
CarbonDataQA1 commented on pull request #3908: URL: https://github.com/apache/carbondata/pull/3908#issuecomment-690020844 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2290/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org