[GitHub] [carbondata] brijoobopanna commented on pull request #3818: [Carbondata-3883] Added filtering for the deleted rows for local dictionary fields
brijoobopanna commented on pull request #3818: URL: https://github.com/apache/carbondata/pull/3818#issuecomment-652208801 add to whitelist This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3885) In the case when in the SI table a segment is deleted and it's entry is deleted from the tablestatus file, during next load command, load into SI fails.
Vikram Ahuja created CARBONDATA-3885: Summary: In the case when in the SI table a segment is deleted and it's entry is deleted from the tablestatus file, during next load command, load into SI fails. Key: CARBONDATA-3885 URL: https://issues.apache.org/jira/browse/CARBONDATA-3885 Project: CarbonData Issue Type: Bug Components: spark-integration Affects Versions: 2.0.0 Reporter: Vikram Ahuja Fix For: 2.0.0 In the case when in the SI table a segment is deleted and it's entry is deleted from the tablestatus file, during next load command, load into SI fails. The steps to create this issue: 1. Create table a 2. Create SI on table a. Let’s call it a_index 3. Insert/Load 3-4 times in main table(a) 4. Check the segments in both the table(should be same) 5. Now delete segments from hdfs from path SI_table/Fact/Part0/Segment_number 6. Also delete the segments entry from table status file (Metadata/tablestatus file) 7. Now check the segments on both the table(SI segments will be less) 8. Do another load/insert. Segment will be inserted in both tha tables, but the delete segment will not be loaded in SI 9. Now run command "alter table DBName.SIName set SERDEPROPERTIES('isSITableEnabled'='false');" 10. check show index on maintable; It will be disabled mode 11. Do another load/insert. Segment will be inserted in both the tables. The previous deleted segments should be added in SI table(but it fails at this step) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3884) During Concurrent loads in main table with SI table with isSITableEnabled = false, one of the concurrent load fails
Vikram Ahuja created CARBONDATA-3884: Summary: During Concurrent loads in main table with SI table with isSITableEnabled = false, one of the concurrent load fails Key: CARBONDATA-3884 URL: https://issues.apache.org/jira/browse/CARBONDATA-3884 Project: CarbonData Issue Type: Bug Components: spark-integration Affects Versions: 2.0.0 Reporter: Vikram Ahuja Fix For: 2.0.0 During Concurrent loads in main table with SI table with isSITableEnabled = false, one of the concurrent load fails The steps are as follows: 1, Create a main table 2. Create SI table 3. Load in main table 4. ALter table set isSITableEnabled'='false' 5. Change SILoadEventListener such that is sleeps for some minutes after getting the main table details. 6. When concurrent loads are fired then it gets the main table details and then it sleeps and the load fails after some time -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3815: [CARBONDATA-3855]support carbon SDK to load data from different files
CarbonDataQA1 commented on pull request #3815: URL: https://github.com/apache/carbondata/pull/3815#issuecomment-652000933 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3279/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
CarbonDataQA1 commented on pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#issuecomment-652000700 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3280/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3815: [CARBONDATA-3855]support carbon SDK to load data from different files
CarbonDataQA1 commented on pull request #3815: URL: https://github.com/apache/carbondata/pull/3815#issuecomment-651999614 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1542/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3819: [CARBONDATA-3855]support carbon SDK to load data from different files
CarbonDataQA1 commented on pull request #3819: URL: https://github.com/apache/carbondata/pull/3819#issuecomment-651998676 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1543/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (CARBONDATA-3883) Table result shows invalid data for local diction column In Presto
[ https://issues.apache.org/jira/browse/CARBONDATA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148770#comment-17148770 ] Nitin Kashyap commented on CARBONDATA-3883: --- Added PR with fix [https://github.com/apache/carbondata/pull/3818] > Table result shows invalid data for local diction column In Presto > -- > > Key: CARBONDATA-3883 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3883 > Project: CarbonData > Issue Type: Bug > Components: presto-integration >Affects Versions: 2.0.0 >Reporter: Nitin Kashyap >Priority: Minor > Labels: newbie > Attachments: image-2020-06-30-20-20-53-267.png, > image-2020-06-30-20-21-06-026.png > > Original Estimate: 2h > Remaining Estimate: 2h > > Record value for string fields is inaccurate when read from presto for table > with local dictionary is enabled; > > *Step 1:* Create a table with local dictionary enabled with string column. > {code:sql} > CREATE TABLE testorders18(orderkey bigint, orderstatus varchar(7), totalprice > double, orderdate date) STORED BY 'org.apache.carbondata.format' ;{code} > *Step 2:* Insert records, and delete such a way that a delete delta is > created (i.e. segment still has some records present) > {code:sql} > INSERT INTO testorders18 VALUES (11,'FAILURE', 125.15, DATE'2019-05-17'), > (12,'SUCCESS', 135.12, DATE'2019-05-20'),(13,'1FAILURE', 125.15, > DATE'2019-05-17'), (14,'SUCCESS', 135.12, DATE'2019-05-20'); > DELETE FROM testorders18 WHERE orderkey IN (SELECT orderkey FROM testorders18 > WHERE orderstatus= 'FAILURE'); > {code} > !image-2020-06-30-20-20-53-267.png! > *Step 3:* Read the result in presto with pushdown row filter enabled > {code:bash} > -Dcarbon.push.rowfilters.for.vector=true{code} > {code:sql} > select * from testorders18;{code} > !image-2020-06-30-20-21-06-026.png! > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3816: [CARBONDATA-3879] Filtering Segments Optimazation
CarbonDataQA1 commented on pull request #3816: URL: https://github.com/apache/carbondata/pull/3816#issuecomment-651858622 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3278/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3883) Table result shows invalid data for local diction column In Presto
Nitin Kashyap created CARBONDATA-3883: - Summary: Table result shows invalid data for local diction column In Presto Key: CARBONDATA-3883 URL: https://issues.apache.org/jira/browse/CARBONDATA-3883 Project: CarbonData Issue Type: Bug Components: presto-integration Affects Versions: 2.0.0 Reporter: Nitin Kashyap Attachments: image-2020-06-30-20-20-53-267.png, image-2020-06-30-20-21-06-026.png Record value for string fields is inaccurate when read from presto for table with local dictionary is enabled; *Step 1:* Create a table with local dictionary enabled with string column. {code:sql} CREATE TABLE testorders18(orderkey bigint, orderstatus varchar(7), totalprice double, orderdate date) STORED BY 'org.apache.carbondata.format' ;{code} *Step 2:* Insert records, and delete such a way that a delete delta is created (i.e. segment still has some records present) {code:sql} INSERT INTO testorders18 VALUES (11,'FAILURE', 125.15, DATE'2019-05-17'), (12,'SUCCESS', 135.12, DATE'2019-05-20'),(13,'1FAILURE', 125.15, DATE'2019-05-17'), (14,'SUCCESS', 135.12, DATE'2019-05-20'); DELETE FROM testorders18 WHERE orderkey IN (SELECT orderkey FROM testorders18 WHERE orderstatus= 'FAILURE'); {code} !image-2020-06-30-20-20-53-267.png! *Step 3:* Read the result in presto with pushdown row filter enabled {code:bash} -Dcarbon.push.rowfilters.for.vector=true{code} {code:sql} select * from testorders18;{code} !image-2020-06-30-20-21-06-026.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] brijoobopanna commented on pull request #3815: [CARBONDATA-3855]support carbon SDK to load data from different files
brijoobopanna commented on pull request #3815: URL: https://github.com/apache/carbondata/pull/3815#issuecomment-651817861 @nihal0107 please elobrate the PR description as it a requirement, check some old requirement PR such as 3478 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on pull request #3816: [CARBONDATA-3879] Filtering Segments Optimazation
marchpure commented on pull request #3816: URL: https://github.com/apache/carbondata/pull/3816#issuecomment-651771840 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3816: [CARBONDATA-3879] Filtering Segments Optimazation
CarbonDataQA1 commented on pull request #3816: URL: https://github.com/apache/carbondata/pull/3816#issuecomment-651763696 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1539/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3816: [CARBONDATA-3879] Filtering Segments Optimazation
CarbonDataQA1 commented on pull request #3816: URL: https://github.com/apache/carbondata/pull/3816#issuecomment-651763251 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3275/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3816: [CARBONDATA-3879] Filtering Segments Optimazation
QiangCai commented on a change in pull request #3816: URL: https://github.com/apache/carbondata/pull/3816#discussion_r447634675 ## File path: core/src/main/java/org/apache/carbondata/core/index/TableIndex.java ## @@ -206,7 +206,8 @@ public CarbonTable getTable() { Set partitionLocations, List blocklets, Map> indexes) throws IOException { for (Segment segment : segments) { - if (indexes.get(segment).isEmpty() || indexes.get(segment) == null) { + if (segment == null || + indexes.get(segment).isEmpty() || indexes.get(segment) == null) { Review comment: change to: segment == null || indexes.get(segment) == null || indexes.get(segment).isEmpty() This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3810: [CARBONDATA-3882] Fix wrong lock and missing Table status lock in some SI flows
CarbonDataQA1 commented on pull request #3810: URL: https://github.com/apache/carbondata/pull/3810#issuecomment-651697959 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1538/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3810: [CARBONDATA-3882] Fix wrong lock and missing Table status lock in some SI flows
CarbonDataQA1 commented on pull request #3810: URL: https://github.com/apache/carbondata/pull/3810#issuecomment-651693999 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3274/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [CARBONDATA-3864] Store Size Optimization
CarbonDataQA1 commented on pull request #3789: URL: https://github.com/apache/carbondata/pull/3789#issuecomment-651680808 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1537/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [CARBONDATA-3864] Store Size Optimization
CarbonDataQA1 commented on pull request #3789: URL: https://github.com/apache/carbondata/pull/3789#issuecomment-651679865 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3273/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3816: [CARBONDATA-3879] Filtering Segments Optimazation
CarbonDataQA1 commented on pull request #3816: URL: https://github.com/apache/carbondata/pull/3816#issuecomment-651671108 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3271/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3816: [CARBONDATA-3879] Filtering Segments Optimazation
CarbonDataQA1 commented on pull request #3816: URL: https://github.com/apache/carbondata/pull/3816#issuecomment-651669395 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1535/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3882) Wrong lock and missing Table status lock in some SI flows
Ajantha Bhat created CARBONDATA-3882: Summary: Wrong lock and missing Table status lock in some SI flows Key: CARBONDATA-3882 URL: https://issues.apache.org/jira/browse/CARBONDATA-3882 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat problem: 1) In updateLoadMetadataWithMergeStatus, we want update SI table status, but lock is acquired on main table 2) triggerCompaction, updateTableStatusForIndexTables -> table status write is happening without lock -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] ajantha-bhat commented on pull request #3809: [CARBONDATA-3881] Fix concurrent main table compaction and SI load issue
ajantha-bhat commented on pull request #3809: URL: https://github.com/apache/carbondata/pull/3809#issuecomment-651596439 @akashrn5 : please check This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3881) concurrent main table compaction and SI load issue
Ajantha Bhat created CARBONDATA-3881: Summary: concurrent main table compaction and SI load issue Key: CARBONDATA-3881 URL: https://issues.apache.org/jira/browse/CARBONDATA-3881 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat problem: consider a scenario, where segmentX has loaded to main table but failed to load to SI table. So, while loading another segmentY, we reload failed SI segmentX. this time if the segmentX is compacted in main table and clean files executed on it. SI load will fail and segmented will not be found in segmentMap of SI and it throws exception. solution: just before reloading the failed SI segment. check if it is valid segment in main table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3771: [CARBONDATA-3849] pushdown array_contains filter to carbon for array of primitive types
ajantha-bhat commented on a change in pull request #3771: URL: https://github.com/apache/carbondata/pull/3771#discussion_r447457038 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala ## @@ -865,6 +869,27 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy { Some(CarbonContainsWith(c)) case c@Literal(v, t) if (v == null) => Some(FalseExpr()) + case c@ArrayContains(a: Attribute, Literal(v, t)) => +a.dataType match { + case arrayType: ArrayType => +arrayType.elementType match { + case StringType => Some(sources.EqualTo(a.name, v)) Review comment: I want reuse existing equalsTo code, I don't see any advantage of making new expression ## File path: integration/spark/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala ## @@ -152,13 +152,25 @@ object CarbonFilters { } def getCarbonExpression(name: String) = { Review comment: I want reuse existing equalsTo code, I don't see any advantage of making new expression This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.m
ajantha-bhat commented on a change in pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#discussion_r447455176 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableLoadingTestCase.scala ## @@ -640,6 +653,10 @@ class StandardPartitionTableLoadingTestCase extends QueryTest with BeforeAndAfte } } + override def afterEach(): Unit = { +CarbonProperties.getInstance() Review comment: If test case fails, we need to fix the test case or code. So, just because test case can fail. we should not add before each as it is redundant for other test case and it will increase CI time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.m
ajantha-bhat commented on a change in pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#discussion_r447455176 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableLoadingTestCase.scala ## @@ -640,6 +653,10 @@ class StandardPartitionTableLoadingTestCase extends QueryTest with BeforeAndAfte } } + override def afterEach(): Unit = { +CarbonProperties.getInstance() Review comment: If test case fails, we need to fix the test case. So, just because test case can fail. we should not add before each as it is redundant for other test case and it will increase CI time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org