[jira] [Updated] (CARBONDATA-4029) After delete in the table which has Alter-added SDK segments, then the count(*) is 0.
[ https://issues.apache.org/jira/browse/CARBONDATA-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-4029: -- Description: Do delete on a table which has alter added SDK segments. then the count* is 0. Even count* will be 0 even any number of SDK segments are added after it. Test queries: drop table if exists external_primitive; create table external_primitive (id int, name string, rank smallint, salary double, active boolean, dob date, doj timestamp, city string, dept string) stored as carbondata; --before executing the below alter add segment-place the attached SDK files in hdfs at /sdkfiles/primitive2 folder; alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');select * from external_primitive; delete from external_primitive where id =2;select * from external_primitive; Console output: /> drop table if exists external_primitive; +-+ | Result | +-+ +-+ No rows selected (1.586 seconds) /> create table external_primitive (id int, name string, rank smallint, salary double, active boolean, dob date, doj timestamp, city string, dept string) stored as carbondata; +-+ | Result | +-+ +-+ No rows selected (0.774 seconds) /> alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');select * from external_primitive; +-+ | Result | +-+ +-+ No rows selected (1.077 seconds) INFO : Execution ID: 320 +-+---+---+--+-+-++++ | id | name | rank | salary | active | dob | doj | city | dept | +-+---+---+--+-+-++++ | 1 | AAA | 3 | 3444345.66 | true | 1979-12-09 | 2011-02-10 01:00:20.0 | Pune | IT | | 2 | BBB | 2 | 543124.66 | false | 1987-02-19 | 2017-01-01 12:00:20.0 | Bangalore | DATA | | 3 | CCC | 1 | 787878.888 | false | 1982-05-12 | 2015-12-01 02:20:20.0 | Pune | DATA | | 4 | DDD | 1 | 9.24 | true | 1981-04-09 | 2000-01-15 07:00:20.0 | Delhi | MAINS | | 5 | EEE | 3 | 545656.99 | true | 1987-12-09 | 2017-11-25 04:00:20.0 | Delhi | IT | | 6 | FFF | 2 | 768678.0 | false | 1987-12-20 | 2017-01-10 05:00:20.0 | Bangalore | DATA | | 7 | GGG | 3 | 765665.0 | true | 1983-06-12 | 2017-01-01 02:00:20.0 | Pune | IT | | 8 | HHH | 2 | 567567.66 | false | 1979-01-12 | 1995-01-01 12:00:20.0 | Bangalore | DATA | | 9 | III | 2 | 787878.767 | true | 1985-02-19 | 2005-08-15 01:00:20.0 | Pune | DATA | | 10 | JJJ | 3 | 887877.14 | true | 2000-05-19 | 2016-10-10 12:00:20.0 | Bangalore | MAINS | | 18 | | 3 | 7.86786786787E9 | true | 1980-10-05 | 1995-10-07 22:00:20.0 | Bangalore | IT | | 19 | | 2 | 5464545.33 | true | 1986-06-06 | 2008-08-15 01:00:20.0 | Delhi | DATA | | 20 | NULL | 3 | 7867867.34 | true | 2000-05-01 | 2014-01-18 12:00:20.0 | Bangalore | MAINS | +-+---+---+--+-+-++++ 13 rows selected (2.458 seconds) /> delete from external_primitive where id =2;select * from external_primitive; INFO : Execution ID: 322 ++ | Deleted Row Count | ++ | 1 | ++ 1 row selected (3.723 seconds) +-+---+---+-+-+--+--+---+---+ | id | name | rank | salary | active | dob | doj | city | dept | +-+---+---+-+-+--+--+---+---+ +-+---+---+-+-+--+--+---+---+ No rows selected (1.531 seconds) /> alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon');select * from external_primitive; +-+ | Result | +-+ +-+ No rows selected (0.766 seconds) +-+---+---+-+-+--+--+---+---+ | id | name | rank | salary | active | dob | doj | city | dept | +-+---+---+-+-+--+--+---+---+ +-+---+---+-+-+--+--+---+---+ No rows selected (1.439 seconds) /> select count(*) from external_primitive; INFO : Execution ID: 335 +---+ | count(1) | +---+ | 0 | +---+ 1 row selected (1.278 seconds) /> > After delete in the table which has Alter-added SDK segments, then the > count(*) is 0. > - > > Key: CARBONDATA-4029 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4029 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: 3 node FI cluster >Reporter: Prasanna Ravichandran >
[jira] [Updated] (CARBONDATA-4029) After delete in the table which has Alter-added SDK segments, then the count(*) is 0.
[ https://issues.apache.org/jira/browse/CARBONDATA-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-4029: -- Description: (was: We are getting Number format exception while querying on the date columns. Attached the SDK files also. Test queries: --SDK compaction; drop table if exists external_primitive; create table external_primitive (id int, name string, rank smallint, salary double, active boolean, dob date, doj timestamp, city string, dept string) stored as carbondata; alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive','format'='carbon'); alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon'); alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon'); alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive4','format'='carbon'); alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive5','format'='carbon'); alter table external_primitive compact 'minor'; --working fine pass; select count(*) from external_primitive;--working fine pass; show segments for table external_primitive; select * from external_primitive limit 13; --working fine pass; select * from external_primitive limit 14; --failed getting number format exception; select min(dob) from external_primitive; --failed getting number format exception; select max(dob) from external_primitive; --working; select dob from external_primitive; --failed getting number format exception; Console: *0: /> show segments for table external_primitive;* +--++--+--+++-+--+ | ID | Status | Load Start Time | Load Time Taken | Partition | Data Size | Index Size | File Format | +--++--+--+++-+--+ | 4 | Success | 2020-10-13 11:52:04.012 | 0.511S | {} | 1.88KB | 655.0B | columnar_v3 | | 3 | Compacted | 2020-10-13 11:52:00.587 | 0.828S | {} | 1.88KB | 655.0B | columnar_v3 | | 2 | Compacted | 2020-10-13 11:51:57.767 | 0.775S | {} | 1.88KB | 655.0B | columnar_v3 | | 1 | Compacted | 2020-10-13 11:51:54.678 | 1.024S | {} | 1.88KB | 655.0B | columnar_v3 | | 0.1 | Success | 2020-10-13 11:52:05.986 | 5.785S | {} | 9.62KB | 5.01KB | columnar_v3 | | 0 | Compacted | 2020-10-13 11:51:51.072 | 1.125S | {} | 8.55KB | 4.25KB | columnar_v3 | +--++--+--+++-+--+ 6 rows selected (0.45 seconds) *0: /> select * from external_primitive limit 13;* --working fine pass; INFO : Execution ID: 95 +-+---+---+--+-+-++++ | id | name | rank | salary | active | dob | doj | city | dept | +-+---+---+--+-+-++++ | 1 | AAA | 3 | 3444345.66 | true | 1979-12-09 | 2011-02-09 22:30:20.0 | Pune | IT | | 2 | BBB | 2 | 543124.66 | false | 1987-02-19 | 2017-01-01 09:30:20.0 | Bangalore | DATA | | 3 | CCC | 1 | 787878.888 | false | 1982-05-12 | 2015-11-30 23:50:20.0 | Pune | DATA | | 4 | DDD | 1 | 9.24 | true | 1981-04-09 | 2000-01-15 04:30:20.0 | Delhi | MAINS | | 5 | EEE | 3 | 545656.99 | true | 1987-12-09 | 2017-11-25 01:30:20.0 | Delhi | IT | | 6 | FFF | 2 | 768678.0 | false | 1987-12-20 | 2017-01-10 02:30:20.0 | Bangalore | DATA | | 7 | GGG | 3 | 765665.0 | true | 1983-06-12 | 2016-12-31 23:30:20.0 | Pune | IT | | 8 | HHH | 2 | 567567.66 | false | 1979-01-12 | 1995-01-01 09:30:20.0 | Bangalore | DATA | | 9 | III | 2 | 787878.767 | true | 1985-02-19 | 2005-08-14 22:30:20.0 | Pune | DATA | | 10 | JJJ | 3 | 887877.14 | true | 2000-05-19 | 2016-10-10 09:30:20.0 | Bangalore | MAINS | | 18 | | 3 | 7.86786786787E9 | true | 1980-10-05 | 1995-10-07 19:30:20.0 | Bangalore | IT | | 19 | | 2 | 5464545.33 | true | 1986-06-06 | 2008-08-14 22:30:20.0 | Delhi | DATA | | 20 | NULL | 3 | 7867867.34 | true | 2000-05-01 | 2014-01-18 09:30:20.0 | Bangalore | MAINS | +-+---+---+--+-+-++++ 13 rows selected (1.775 seconds) *0: /> select * from external_primitive limit 14;* --failed getting number format exception; INFO : Execution ID: 97 *java.lang.NumberFormatException: For input string: "776"* at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:569) at java.lang.Integer.parseInt(Integer.java:615) at java.sql.Date.valueOf(Date.java:133) at
[jira] [Updated] (CARBONDATA-4029) After delete in the table which has Alter-added SDK segments, then the count(*) is 0.
[ https://issues.apache.org/jira/browse/CARBONDATA-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-4029: -- Summary: After delete in the table which has Alter-added SDK segments, then the count(*) is 0. (was: Getting Number format exception while querying on date columns in SDK carbon table.) > After delete in the table which has Alter-added SDK segments, then the > count(*) is 0. > - > > Key: CARBONDATA-4029 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4029 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: 3 node FI cluster >Reporter: Prasanna Ravichandran >Priority: Minor > Attachments: Primitive.rar > > > We are getting Number format exception while querying on the date columns. > Attached the SDK files also. > Test queries: > --SDK compaction; > drop table if exists external_primitive; > create table external_primitive (id int, name string, rank smallint, salary > double, active boolean, dob date, doj timestamp, city string, dept string) > stored as carbondata; > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive4','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive5','format'='carbon'); > > alter table external_primitive compact 'minor'; --working fine pass; > select count(*) from external_primitive;--working fine pass; > show segments for table external_primitive; > select * from external_primitive limit 13; --working fine pass; > select * from external_primitive limit 14; --failed getting number format > exception; > select min(dob) from external_primitive; --failed getting number format > exception; > select max(dob) from external_primitive; --working; > select dob from external_primitive; --failed getting number format exception; > Console: > *0: /> show segments for table external_primitive;* > +--++--+--+++-+--+ > | ID | Status | Load Start Time | Load Time Taken | Partition | Data Size | > Index Size | File Format | > +--++--+--+++-+--+ > | 4 | Success | 2020-10-13 11:52:04.012 | 0.511S | {} | 1.88KB | 655.0B | > columnar_v3 | > | 3 | Compacted | 2020-10-13 11:52:00.587 | 0.828S | {} | 1.88KB | 655.0B | > columnar_v3 | > | 2 | Compacted | 2020-10-13 11:51:57.767 | 0.775S | {} | 1.88KB | 655.0B | > columnar_v3 | > | 1 | Compacted | 2020-10-13 11:51:54.678 | 1.024S | {} | 1.88KB | 655.0B | > columnar_v3 | > | 0.1 | Success | 2020-10-13 11:52:05.986 | 5.785S | {} | 9.62KB | 5.01KB | > columnar_v3 | > | 0 | Compacted | 2020-10-13 11:51:51.072 | 1.125S | {} | 8.55KB | 4.25KB | > columnar_v3 | > +--++--+--+++-+--+ > 6 rows selected (0.45 seconds) > *0: /> select * from external_primitive limit 13;* --working fine pass; > INFO : Execution ID: 95 > +-+---+---+--+-+-++++ > | id | name | rank | salary | active | dob | doj | city | dept | > +-+---+---+--+-+-++++ > | 1 | AAA | 3 | 3444345.66 | true | 1979-12-09 | 2011-02-09 22:30:20.0 | Pune > | IT | > | 2 | BBB | 2 | 543124.66 | false | 1987-02-19 | 2017-01-01 09:30:20.0 | > Bangalore | DATA | > | 3 | CCC | 1 | 787878.888 | false | 1982-05-12 | 2015-11-30 23:50:20.0 | > Pune | DATA | > | 4 | DDD | 1 | 9.24 | true | 1981-04-09 | 2000-01-15 04:30:20.0 | Delhi > | MAINS | > | 5 | EEE | 3 | 545656.99 | true | 1987-12-09 | 2017-11-25 01:30:20.0 | Delhi > | IT | > | 6 | FFF | 2 | 768678.0 | false | 1987-12-20 | 2017-01-10 02:30:20.0 | > Bangalore | DATA | > | 7 | GGG | 3 | 765665.0 | true | 1983-06-12 | 2016-12-31 23:30:20.0 | Pune | > IT | > | 8 | HHH | 2 | 567567.66 | false | 1979-01-12 | 1995-01-01 09:30:20.0 | > Bangalore | DATA | > | 9 | III | 2 | 787878.767 | true | 1985-02-19 | 2005-08-14 22:30:20.0 | Pune > | DATA | > | 10 | JJJ | 3 | 887877.14 | true | 2000-05-19 | 2016-10-10 09:30:20.0 | > Bangalore | MAINS | >
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder
CarbonDataQA1 commented on pull request #3999: URL: https://github.com/apache/carbondata/pull/3999#issuecomment-719473886 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2979/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder
CarbonDataQA1 commented on pull request #3999: URL: https://github.com/apache/carbondata/pull/3999#issuecomment-719473427 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4737/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #4000: [WIP][CARBONDATA-4020]Fixed drop index when multiple index exists
CarbonDataQA1 commented on pull request #4000: URL: https://github.com/apache/carbondata/pull/4000#issuecomment-719435945 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2978/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #4000: [WIP][CARBONDATA-4020]Fixed drop index when multiple index exists
CarbonDataQA1 commented on pull request #4000: URL: https://github.com/apache/carbondata/pull/4000#issuecomment-719433040 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4736/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on a change in pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder
marchpure commented on a change in pull request #3999: URL: https://github.com/apache/carbondata/pull/3999#discussion_r514948928 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/allqueries/TestPruneUsingSegmentMinMax.scala ## @@ -103,7 +103,7 @@ class TestPruneUsingSegmentMinMax extends QueryTest with BeforeAndAfterAll { sql("update carbon set(a)=(10) where a=1").collect() checkAnswer(sql("select count(*) from carbon where a=10"), Seq(Row(3))) showCache = sql("show metacache on table carbon").collect() -assert(showCache(0).get(2).toString.equalsIgnoreCase("6/8 index files cached")) +assert(showCache(0).get(2).toString.equalsIgnoreCase("1/6 index files cached")) Review comment: The reason of why there is only 1 index files cached: For now. the HorizontalCompaction will trigger IndexStoreManager.getInstance().clearInvalidSegments, which will clean all index of valid segments, other guys is handling this performance issue This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on a change in pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder
marchpure commented on a change in pull request #3999: URL: https://github.com/apache/carbondata/pull/3999#discussion_r514946130 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala ## @@ -340,7 +340,8 @@ private[sql] case class CarbonProjectForUpdateCommand( case _ => sys.error("") } -val updateTableModel = UpdateTableModel(true, currentTime, executorErrors, deletedSegments) +val updateTableModel = UpdateTableModel(true, currentTime, executorErrors, deletedSegments, + !carbonRelation.carbonTable.isHivePartitionTable) Review comment: done, please have a review This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #3912: [CARBONDATA-3977] Global sort partitions should be determined dynamically
QiangCai commented on a change in pull request #3912: URL: https://github.com/apache/carbondata/pull/3912#discussion_r514942737 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestGlobalSortDataLoad.scala ## @@ -527,6 +529,73 @@ class TestGlobalSortDataLoad extends QueryTest with BeforeAndAfterEach with Befo assert(sql("select * from carbon_global_sort_update").count() == 22) } + test("calculate the global sort partitions automatically when user does not give in load options ") { Review comment: add more test case to config a small defaultMaxSplitBytes, let it has multiple partitions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder
CarbonDataQA1 commented on pull request #3999: URL: https://github.com/apache/carbondata/pull/3999#issuecomment-719410167 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2976/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder
CarbonDataQA1 commented on pull request #3999: URL: https://github.com/apache/carbondata/pull/3999#issuecomment-71941 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4735/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 opened a new pull request #4000: [WIP][CARBONDATA-4020]Fixed drop index when multiple index exists
nihal0107 opened a new pull request #4000: URL: https://github.com/apache/carbondata/pull/4000 ### Why is this PR needed? Currently when we have multiple bloom indexes and we drop one index then 'show index' command is showing an empty index list. ### What changes were proposed in this PR? Checked, if no CG or FG index then set indexExist as true. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #3972: [CARBONDATA-4042]Launch same number of task as select query for insert into select and ctas cases when target table is of no_sort
ajantha-bhat commented on pull request #3972: URL: https://github.com/apache/carbondata/pull/3972#issuecomment-719310576 @VenuReddy2103 : If you have a performance benchmark with this change? Once I tried sending no sort [1 node 1 task] to global sort flow [launch more task], I observed performance degrade for TPCH lineitem table 15GB insert. so, I suggest you to check the performance with this change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org