[jira] [Resolved] (CARBONDATA-4282) Issues with table having complex columns related to long string, SI, local dictionary

2021-09-08 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-4282.
--
Fix Version/s: 2.3.0
   Resolution: Fixed

> Issues with table having complex columns related to long string, SI, local 
> dictionary
> -
>
> Key: CARBONDATA-4282
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4282
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Major
> Fix For: 2.3.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> *1. Insert/load fails after alter add complex column if table contains long 
> string columns.*
>   [Steps] :-  
> DROP TABLE IF EXISTS alter_com;
> CREATE TABLE alter_com(intfield int,EDUCATED string ,rankk string ) STORED AS 
> carbondata 
> TBLPROPERTIES('inverted_index'='intField','sort_columns'='intField','TABLE_BLOCKSIZE'=
>  '256 
> MB','TABLE_BLOCKLET_SIZE'='8','SORT_SCOPE'='no_sort','COLUMN_META_CACHE'='rankk','carbon.column.compressor'='gzip','long_string_columns'='rankk','table_page_size_inmb'='1');
> insert into alter_com values(1,'cse','xi'); select * from alter_com limit 1;
> ALTER TABLE alter_com ADD COLUMNS(map1 Map, map2 Map, 
> map3 Map, map4 Map, map5 
> Map,map6 Map,map7 map>, 
> map8 map>>); 
> ALTER TABLE alter_com SET TBLPROPERTIES('long_string_columns'='EDUCATED');
> insert into alter_com values(1,'ece','x', map(1,2),map(3,2.34), 
> map(1.23,'hello'),map('abc','def'), 
> map(true,'2017-02-01'),map('time','2018-02-01 
> 02:00:00.0'),map('ph',array(1,2)), 
> map('a',named_struct('d',23,'s',named_struct('im','sh';
> [Expected Result] :- insert/load should be success after alter add map column 
> ,if table contains long string columns
> *2. create index on array of complex column (map/struct) throws null pointer 
> exception instead of correct error message.*
> [Steps] :-
> drop table if exists strarmap1; create table strarmap1(id int,name string,str 
> struct>,arr 
> array>) stored as carbondata 
> tblproperties('inverted_index'='name','sort_columns'='name','TABLE_BLOCKSIZE'=
>  '256 MB','TABLE_BLOCKLET_SIZE'='8','CACHE_LEVEL'='BLOCKLET'); load data 
> inpath 'hdfs://hacluster/chetan/strarmap1.csv' into table strarmap1 
> options('fileheader'='id,name,str,arr','COMPLEX_DELIMITER_LEVEL_3'='#','COMPLEX_DELIMITER_LEVEL_2'='$','COMPLEX_DELIMITER_LEVEL_1'='&','BAD_RECORDS_ACTION'='FORCE');
>  CREATE INDEX index2 ON TABLE strarmap1 (arr) as 'carbondata' 
> properties('sort_scope'='global_sort','global_sort_partitions'='3');
> [Expected Result] :- create index on array of map(string,timestamp) should 
> thrown correct validation error message.
> [Actual Issue]:- create index on array of map(string,timestamp) throws null 
> pointer exception instead of correct error message
> *3. alter table property local dictionary inlcude/exclude with newly added 
> map column is failing.*
> [Steps] :-
> drop table if exists strarmap1; create table strarmap1(id int,name string,str 
> struct>,arr 
> array>) stored as carbondata 
> tblproperties('inverted_index'='name','sort_columns'='name','local_dictionary_enable'='false','local_dictionary_include'='map1','local_dictionary_exclude'='str,arr','local_dictionary_threshold'='1000');
> ALTER TABLE strarmap1 ADD COLUMNS(map1 Map, map2 Map, 
> map3 Map, map4 Map, map5 
> Map,map6 Map,map7 map>, 
> map8 map>>); ALTER TABLE strarmap1 
> SET 
> TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE'='true','local_dictionary_include'='map4','local_dictionary_threshold'='1000');
> [Expected Result] :- alter table property local dictionary inlcude/exclude 
> with newly added map column should be success
> [Actual Issue]:- alter table property local dictionary inlcude/exclude with 
> newly added map column is failing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4282) Issues with table having complex columns related to long string, SI, local dictionary

2021-09-08 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4282:
-
Priority: Minor  (was: Major)

> Issues with table having complex columns related to long string, SI, local 
> dictionary
> -
>
> Key: CARBONDATA-4282
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4282
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Minor
> Fix For: 2.3.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> *1. Insert/load fails after alter add complex column if table contains long 
> string columns.*
>   [Steps] :-  
> DROP TABLE IF EXISTS alter_com;
> CREATE TABLE alter_com(intfield int,EDUCATED string ,rankk string ) STORED AS 
> carbondata 
> TBLPROPERTIES('inverted_index'='intField','sort_columns'='intField','TABLE_BLOCKSIZE'=
>  '256 
> MB','TABLE_BLOCKLET_SIZE'='8','SORT_SCOPE'='no_sort','COLUMN_META_CACHE'='rankk','carbon.column.compressor'='gzip','long_string_columns'='rankk','table_page_size_inmb'='1');
> insert into alter_com values(1,'cse','xi'); select * from alter_com limit 1;
> ALTER TABLE alter_com ADD COLUMNS(map1 Map, map2 Map, 
> map3 Map, map4 Map, map5 
> Map,map6 Map,map7 map>, 
> map8 map>>); 
> ALTER TABLE alter_com SET TBLPROPERTIES('long_string_columns'='EDUCATED');
> insert into alter_com values(1,'ece','x', map(1,2),map(3,2.34), 
> map(1.23,'hello'),map('abc','def'), 
> map(true,'2017-02-01'),map('time','2018-02-01 
> 02:00:00.0'),map('ph',array(1,2)), 
> map('a',named_struct('d',23,'s',named_struct('im','sh';
> [Expected Result] :- insert/load should be success after alter add map column 
> ,if table contains long string columns
> *2. create index on array of complex column (map/struct) throws null pointer 
> exception instead of correct error message.*
> [Steps] :-
> drop table if exists strarmap1; create table strarmap1(id int,name string,str 
> struct>,arr 
> array>) stored as carbondata 
> tblproperties('inverted_index'='name','sort_columns'='name','TABLE_BLOCKSIZE'=
>  '256 MB','TABLE_BLOCKLET_SIZE'='8','CACHE_LEVEL'='BLOCKLET'); load data 
> inpath 'hdfs://hacluster/chetan/strarmap1.csv' into table strarmap1 
> options('fileheader'='id,name,str,arr','COMPLEX_DELIMITER_LEVEL_3'='#','COMPLEX_DELIMITER_LEVEL_2'='$','COMPLEX_DELIMITER_LEVEL_1'='&','BAD_RECORDS_ACTION'='FORCE');
>  CREATE INDEX index2 ON TABLE strarmap1 (arr) as 'carbondata' 
> properties('sort_scope'='global_sort','global_sort_partitions'='3');
> [Expected Result] :- create index on array of map(string,timestamp) should 
> thrown correct validation error message.
> [Actual Issue]:- create index on array of map(string,timestamp) throws null 
> pointer exception instead of correct error message
> *3. alter table property local dictionary inlcude/exclude with newly added 
> map column is failing.*
> [Steps] :-
> drop table if exists strarmap1; create table strarmap1(id int,name string,str 
> struct>,arr 
> array>) stored as carbondata 
> tblproperties('inverted_index'='name','sort_columns'='name','local_dictionary_enable'='false','local_dictionary_include'='map1','local_dictionary_exclude'='str,arr','local_dictionary_threshold'='1000');
> ALTER TABLE strarmap1 ADD COLUMNS(map1 Map, map2 Map, 
> map3 Map, map4 Map, map5 
> Map,map6 Map,map7 map>, 
> map8 map>>); ALTER TABLE strarmap1 
> SET 
> TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE'='true','local_dictionary_include'='map4','local_dictionary_threshold'='1000');
> [Expected Result] :- alter table property local dictionary inlcude/exclude 
> with newly added map column should be success
> [Actual Issue]:- alter table property local dictionary inlcude/exclude with 
> newly added map column is failing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4278) Avoid refetching all indexes to get segment properties

2021-09-01 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-4278.
--
Fix Version/s: 2.3.0
   Resolution: Fixed

> Avoid refetching all indexes to get segment properties
> --
>
> Key: CARBONDATA-4278
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4278
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Mahesh Raju Somalaraju
>Priority: Major
> Fix For: 2.3.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> h1. Avoid refetching all indexes to get segment properties
>  
> 1) When block index is available then no need to prepare blockindex from 
> available segments and partition locations.
> 2) call directly getsegment properties if blockindex available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4269) [Doc][summer-2021]Update url and description in prestosql-guide.md

2021-08-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4269:
-
Affects Version/s: (was: 2.2.0)
   2.3.0

> [Doc][summer-2021]Update url and description in prestosql-guide.md
> --
>
> Key: CARBONDATA-4269
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4269
> Project: CarbonData
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 2.3.0
>Reporter: chenzhengyu
>Assignee: chenzhengyu
>Priority: Trivial
> Fix For: 2.3.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> PrestoSQL has now changed its name to Trino. Because Facebook established the 
> Presto Foundation at The Linux Foundation®,Led to prestosql Must be change 
> the name
>  More information can see 
> [here|https://trino.io/blog/2020/12/27/announcing-trino.html]
> [https://prestosql.io/docs/current/installation/deployment.html]
> at now is Jump to the prestodb Instead of jumping to prestosql(trino),we must 
> fix it and tell user prestosql website is migration to the trino



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4269) [Doc][summer-2021]Update url and description in prestosql-guide.md

2021-08-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4269:
-
Fix Version/s: (was: 2.2.0)
   2.3.0

> [Doc][summer-2021]Update url and description in prestosql-guide.md
> --
>
> Key: CARBONDATA-4269
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4269
> Project: CarbonData
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 2.2.0
>Reporter: chenzhengyu
>Assignee: chenzhengyu
>Priority: Trivial
> Fix For: 2.3.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> PrestoSQL has now changed its name to Trino. Because Facebook established the 
> Presto Foundation at The Linux Foundation®,Led to prestosql Must be change 
> the name
>  More information can see 
> [here|https://trino.io/blog/2020/12/27/announcing-trino.html]
> [https://prestosql.io/docs/current/installation/deployment.html]
> at now is Jump to the prestodb Instead of jumping to prestosql(trino),we must 
> fix it and tell user prestosql website is migration to the trino



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4269) [Doc][summer-2021]Update url and description in prestosql-guide.md

2021-08-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-4269.
--
Resolution: Fixed

> [Doc][summer-2021]Update url and description in prestosql-guide.md
> --
>
> Key: CARBONDATA-4269
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4269
> Project: CarbonData
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 2.2.0
>Reporter: chenzhengyu
>Assignee: chenzhengyu
>Priority: Trivial
> Fix For: 2.2.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> PrestoSQL has now changed its name to Trino. Because Facebook established the 
> Presto Foundation at The Linux Foundation®,Led to prestosql Must be change 
> the name
>  More information can see 
> [here|https://trino.io/blog/2020/12/27/announcing-trino.html]
> [https://prestosql.io/docs/current/installation/deployment.html]
> at now is Jump to the prestodb Instead of jumping to prestosql(trino),we must 
> fix it and tell user prestosql website is migration to the trino



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4242) Improve the carbondata CDC merge performance Phase2

2021-07-29 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-4242.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

> Improve the carbondata CDC merge performance Phase2
> ---
>
> Key: CARBONDATA-4242
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4242
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Akash R Nilugal
>Assignee: Akash R Nilugal
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 12h 40m
>  Remaining Estimate: 0h
>
> Identify the bottleneck and improve the performance



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (CARBONDATA-4171) Transaction Manager, time travel and segment interface refactoring

2021-05-31 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4171:
-
Comment: was deleted

(was: [^Transaction manager, time travel, segment interface_v2.pdf])

> Transaction Manager, time travel and segment interface refactoring
> --
>
> Key: CARBONDATA-4171
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4171
> Project: CarbonData
>  Issue Type: Improvement
>    Reporter: Ajantha Bhat
>Priority: Major
> Attachments: Feature impact analysis with segment refactor, time 
> travel, transaction manager_v1.xlsx, Transaction manager, time travel, 
> segment interface_v1.pdf
>
>
> *Goals:*
> *1) Implement a “Transaction Manager” with optimistic concurrency to provide 
> within a table transaction / versioning.* (interfaces should also be flexible 
> enough to support across table transactions)
> *2) Support time travel in carbonData.*
> *3) Decouple and clean up segment interfaces.* (which should also help in 
> supporting segment concepts to other open format under carbondata metadata 
> service)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4171) Transaction Manager, time travel and segment interface refactoring

2021-05-31 Thread Ajantha Bhat (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354426#comment-17354426
 ] 

Ajantha Bhat commented on CARBONDATA-4171:
--

[^Transaction manager, time travel, segment interface_v2.pdf]

> Transaction Manager, time travel and segment interface refactoring
> --
>
> Key: CARBONDATA-4171
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4171
> Project: CarbonData
>  Issue Type: Improvement
>    Reporter: Ajantha Bhat
>Priority: Major
> Attachments: Feature impact analysis with segment refactor, time 
> travel, transaction manager_v1.xlsx, Transaction manager, time travel, 
> segment interface_v1.pdf
>
>
> *Goals:*
> *1) Implement a “Transaction Manager” with optimistic concurrency to provide 
> within a table transaction / versioning.* (interfaces should also be flexible 
> enough to support across table transactions)
> *2) Support time travel in carbonData.*
> *3) Decouple and clean up segment interfaces.* (which should also help in 
> supporting segment concepts to other open format under carbondata metadata 
> service)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4183) Local sort Partition Load and Compaction improvement

2021-05-25 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-4183.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

> Local sort Partition Load and Compaction improvement
> 
>
> Key: CARBONDATA-4183
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4183
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently, number of tasks for partition table local sort load, is decided 
> based on input file size. In this case, the data will not be properly sorted, 
> as tasks launched is more. For compaction, number of tasks is equal to number 
> of partitions. If data is huge for a partition, then there can be chances, 
> that compaction will fail with OOM with less memory configurations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4166) GeoSpatial Query Enhancements

2021-05-09 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-4166.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

> GeoSpatial Query Enhancements
> -
>
> Key: CARBONDATA-4166
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4166
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
> Fix For: 2.2.0
>
> Attachments: Geo-spatial Enhancements_v1.pdf, Geo-spatial 
> Enhancements_v2.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Design document link:
> [https://docs.google.com/document/d/1YmTbHa0P39P8iURb_IZErIdZzx7jFgZ5fw6E2BdHjKQ/edit?usp=sharing]
>  
> [^Geo-spatial Enhancements_v1.pdf]
>  
> ^Version 2:^
> [https://docs.google.com/document/d/19AQj90Rll9iXBVcCMpcuKanwgDE6wuiGMYYIJzUuktk/edit?usp=sharing]
> [^Geo-spatial Enhancements_v2.pdf]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4171) Transaction Manager, time travel and segment interface refactoring

2021-04-28 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4171:
-
Attachment: Feature impact analysis with segment refactor, time travel, 
transaction manager_v1.xlsx

> Transaction Manager, time travel and segment interface refactoring
> --
>
> Key: CARBONDATA-4171
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4171
> Project: CarbonData
>  Issue Type: Improvement
>    Reporter: Ajantha Bhat
>Priority: Major
> Attachments: Feature impact analysis with segment refactor, time 
> travel, transaction manager_v1.xlsx, Transaction manager, time travel, 
> segment interface_v1.pdf
>
>
> *Goals:*
> *1) Implement a “Transaction Manager” with optimistic concurrency to provide 
> within a table transaction / versioning.* (interfaces should also be flexible 
> enough to support across table transactions)
> *2) Support time travel in carbonData.*
> *3) Decouple and clean up segment interfaces.* (which should also help in 
> supporting segment concepts to other open format under carbondata metadata 
> service)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4174) Handle exception for desc column

2021-04-26 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-4174.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

> Handle exception for desc column
> 
>
> Key: CARBONDATA-4174
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4174
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Minor
> Fix For: 2.2.0
>
>
> Validation not present for children column in desc column for a primitive 
> datatype and higher level non existing children column desc column for a 
> complex datatype
> drop table if exists complexcarbontable; create table complexcarbontable 
> (deviceInformationId int,channelsId string,ROMSize string,purchasedate 
> string,mobile struct,MAC array,gamePointId 
> map,contractNumber double) STORED AS carbondata;
> describe column deviceInformationId.x on complexcarbontable; describe column 
> channelsId.x on complexcarbontable;
> describe column mobile.imei.x on complexcarbontable; describe column 
> MAC.item.x on complexcarbontable; describe column gamePointId.key.x on 
> complexcarbontable;
> [Expected Result] :- Validation should be provided for children column in 
> desc column for a primitive datatype and higher level non existing children 
> column desc column for a complex datatype. Command execution should fail.
> [Actual Issue] : - Validation not present for children column in desc column 
> for a primitive datatype and higher level non existing children column desc 
> column for a complex datatype. As a result the command execution is 
> successful.
> [!https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/3/21/c71035/7a3b04d78ceb4a489e6c038f4bb257db/image.png!|https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/3/21/c71035/7a3b04d78ceb4a489e6c038f4bb257db/image.png]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4173) Fix inverted index query issue

2021-04-26 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-4173.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

> Fix inverted index query issue
> --
>
> Key: CARBONDATA-4173
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4173
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Minor
> Fix For: 2.2.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
>  select query with filter column which is present in inverted_index column 
> does not return any value
> From Spark beeline/SQL/Shell execute the following queries
> drop table if exists uniqdata6;
> CREATE TABLE uniqdata6(cust_id int,cust_name string,ACTIVE_EMUI_VERSION 
> string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double, INTEGER_COLUMN1 
> int)stored as carbondata TBLPROPERTIES ('sort_columns'='CUST_ID,CUST_NAME', 
> 'inverted_index'='CUST_ID,CUST_NAME','sort_scope'='global_sort');
> LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table 
> uniqdata6 OPTIONS ('FILEHEADER'='CUST_ID,CUST_NAME 
> ,ACTIVE_EMUI_VERSION,DOB,DOJ, 
> BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1, 
> Double_COLUMN2,INTEGER_COLUMN1','BAD_RECORDS_ACTION'='FORCE');
> select cust_name from uniqdata6 limit 5;
> select * from uniqdata6 where CUST_NAME='CUST_NAME_2';
> select * from uniqdata6 where CUST_NAME='CUST_NAME_3';
>  
> [Expected Result] :- select query with filter column which is present in 
> inverted_index column should return correct value
> [Actual Issue] : - select query with filter column which is present in 
> inverted_index column does not return any value
> [!https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/3/15/c71035/05443c9a9c11457e947645f1cf0ad347/image.png!|https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/3/15/c71035/05443c9a9c11457e947645f1cf0ad347/image.png]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4171) Transaction Manager, time travel and segment interface refactoring

2021-04-22 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4171:
-
Attachment: Transaction manager, time travel, segment interface_v1.pdf

> Transaction Manager, time travel and segment interface refactoring
> --
>
> Key: CARBONDATA-4171
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4171
> Project: CarbonData
>  Issue Type: Improvement
>    Reporter: Ajantha Bhat
>Priority: Major
> Attachments: Transaction manager, time travel, segment 
> interface_v1.pdf
>
>
> *Goals:*
> *1) Implement a “Transaction Manager” with optimistic concurrency to provide 
> within a table transaction / versioning.* (interfaces should also be flexible 
> enough to support across table transactions)
> *2) Support time travel in carbonData.*
> *3) Decouple and clean up segment interfaces.* (which should also help in 
> supporting segment concepts to other open format under carbondata metadata 
> service)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (CARBONDATA-2827) Refactor Segment Status Manager Interface

2021-04-22 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat closed CARBONDATA-2827.

Resolution: Duplicate

> Refactor Segment Status Manager Interface
> -
>
> Key: CARBONDATA-2827
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2827
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Ravindra Pesala
>Priority: Major
> Attachments: Segment Management interface design_V3.pdf, Segment 
> Status Management interface design_V1.docx, Segment Status Management 
> interface design_V1_Ramana_reviewed.docx, Segment Status Management interface 
> design_V2.pdf
>
>
> Carbon uses tablestatus file to record segment status and details of each 
> segment during each load. This tablestatus enables carbon to support 
> concurrent loads and reads without data inconsistency or corruption.
> So it is very important feature of carbondata and we should have clean 
> interfaces to maintain it. Current tablestatus updation is shattered to 
> multiple places and there is no clean interface, so I am proposing to 
> refactor current SegmentStatusManager interface and bringing all tablestatus 
> operations to single interface.  
> This new interface allows to add table status to any other storage like DB. 
> This is needed for S3 type object stores as  these are eventually consistent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-2827) Refactor Segment Status Manager Interface

2021-04-22 Thread Ajantha Bhat (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17327125#comment-17327125
 ] 

Ajantha Bhat commented on CARBONDATA-2827:
--

will be handled as part of *CARBONDATA-4171*

 

*https://issues.apache.org/jira/browse/CARBONDATA-4171*

> Refactor Segment Status Manager Interface
> -
>
> Key: CARBONDATA-2827
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2827
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Ravindra Pesala
>Priority: Major
> Attachments: Segment Management interface design_V3.pdf, Segment 
> Status Management interface design_V1.docx, Segment Status Management 
> interface design_V1_Ramana_reviewed.docx, Segment Status Management interface 
> design_V2.pdf
>
>
> Carbon uses tablestatus file to record segment status and details of each 
> segment during each load. This tablestatus enables carbon to support 
> concurrent loads and reads without data inconsistency or corruption.
> So it is very important feature of carbondata and we should have clean 
> interfaces to maintain it. Current tablestatus updation is shattered to 
> multiple places and there is no clean interface, so I am proposing to 
> refactor current SegmentStatusManager interface and bringing all tablestatus 
> operations to single interface.  
> This new interface allows to add table status to any other storage like DB. 
> This is needed for S3 type object stores as  these are eventually consistent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4171) Transaction Manager, time travel and segment interface refactoring

2021-04-22 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4171:
-
Description: 
*Goals:*

*1) Implement a “Transaction Manager” with optimistic concurrency to provide 
within a table transaction / versioning.* (interfaces should also be flexible 
enough to support across table transactions)

*2) Support time travel in carbonData.*

*3) Decouple and clean up segment interfaces.* (which should also help in 
supporting segment concepts to other open format under carbondata metadata 
service)

  was:
*Goals:*

*1) Implement a “Transaction Manager” with optimistic concurrency to provide 
within a table transaction / versioning.* (interfaces should also be flexible 
enough to support across table transactions)

*2) Support time travel in carbonData.*

***3) Decouple and clean up segment interfaces.* (which should also help in 
supporting segment concepts to other open format under carbondata metadata 
service)


> Transaction Manager, time travel and segment interface refactoring
> --
>
> Key: CARBONDATA-4171
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4171
> Project: CarbonData
>  Issue Type: Improvement
>    Reporter: Ajantha Bhat
>Priority: Major
>
> *Goals:*
> *1) Implement a “Transaction Manager” with optimistic concurrency to provide 
> within a table transaction / versioning.* (interfaces should also be flexible 
> enough to support across table transactions)
> *2) Support time travel in carbonData.*
> *3) Decouple and clean up segment interfaces.* (which should also help in 
> supporting segment concepts to other open format under carbondata metadata 
> service)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4171) Transaction Manager, time travel and segment interface refactoring

2021-04-22 Thread Ajantha Bhat (Jira)
Ajantha Bhat created CARBONDATA-4171:


 Summary: Transaction Manager, time travel and segment interface 
refactoring
 Key: CARBONDATA-4171
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4171
 Project: CarbonData
  Issue Type: Improvement
Reporter: Ajantha Bhat


*Goals:*

*1) Implement a “Transaction Manager” with optimistic concurrency to provide 
within a table transaction / versioning.* (interfaces should also be flexible 
enough to support across table transactions)

*2) Support time travel in carbonData.*

***3) Decouple and clean up segment interfaces.* (which should also help in 
supporting segment concepts to other open format under carbondata metadata 
service)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4065) Support MERGE INTO SQL Command

2021-03-31 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-4065.
--
Fix Version/s: 2.1.1
   Resolution: Fixed

> Support MERGE  INTO SQL Command
> ---
>
> Key: CARBONDATA-4065
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4065
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Jianxi Li
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4032) Drop partition command clean other partition dictionaries

2021-03-25 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-4032.
--
Resolution: Fixed

> Drop partition command clean other partition dictionaries
> -
>
> Key: CARBONDATA-4032
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4032
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.0.1
>Reporter: Xingjun Hao
>Priority: Critical
> Fix For: 2.1.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> 1. CREATE TABLE droppartition (id STRING, sales STRING) PARTITIONED BY (dtm 
> STRING)STORED AS carbondata
> 2. insert into droppartition values ('01', '0', '20200907'),('03', '0', 
> '20200908'),
> 3. insert overwrite table droppartition partition (dtm=20200908) select * 
> from droppartition where dtm = 20200907;
> insert overwrite table droppartition partition (dtm=20200909) select * from 
> droppartition where dtm = 20200907;
> 4. alter table droppartition drop partition (dtm=20200909)
> the dirctionary "20200908" was deleted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4026) Thread leakage while Loading

2021-03-25 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4026:
-
Fix Version/s: (was: 2.1.1)
   2.1.0

> Thread leakage while Loading
> 
>
> Key: CARBONDATA-4026
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4026
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 2.0.1
>Reporter: Xingjun Hao
>Priority: Major
> Fix For: 2.1.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> A few code of Inserting/Loading/InsertStage/IndexServer won't shutdown 
> executorservice. leads to thread leakage which will degrade the performance 
> of the driver and executor. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4026) Thread leakage while Loading

2021-03-25 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-4026.
--
Resolution: Fixed

> Thread leakage while Loading
> 
>
> Key: CARBONDATA-4026
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4026
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 2.0.1
>Reporter: Xingjun Hao
>Priority: Major
> Fix For: 2.1.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> A few code of Inserting/Loading/InsertStage/IndexServer won't shutdown 
> executorservice. leads to thread leakage which will degrade the performance 
> of the driver and executor. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4032) Drop partition command clean other partition dictionaries

2021-03-25 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4032:
-
Fix Version/s: (was: 2.1.1)
   2.1.0

> Drop partition command clean other partition dictionaries
> -
>
> Key: CARBONDATA-4032
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4032
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.0.1
>Reporter: Xingjun Hao
>Priority: Critical
> Fix For: 2.1.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> 1. CREATE TABLE droppartition (id STRING, sales STRING) PARTITIONED BY (dtm 
> STRING)STORED AS carbondata
> 2. insert into droppartition values ('01', '0', '20200907'),('03', '0', 
> '20200908'),
> 3. insert overwrite table droppartition partition (dtm=20200908) select * 
> from droppartition where dtm = 20200907;
> insert overwrite table droppartition partition (dtm=20200909) select * from 
> droppartition where dtm = 20200907;
> 4. alter table droppartition drop partition (dtm=20200909)
> the dirctionary "20200908" was deleted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4154) Concurrent issues with clean files command

2021-03-25 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-4154.
--
Fix Version/s: 2.1.1
   Resolution: Fixed

> Concurrent issues with clean files command
> --
>
> Key: CARBONDATA-4154
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4154
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Vikram Ahuja
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Load and compaction failure when concurrently ran with clean files command



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4102) Add UT and FT to improve coverage of SI module.

2021-03-17 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4102:
-
Issue Type: Test  (was: Bug)

> Add UT and FT to improve coverage of SI module.
> ---
>
> Key: CARBONDATA-4102
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4102
> Project: CarbonData
>  Issue Type: Test
>Reporter: Nihal kumar ojha
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> Add UT and FT to improve coverage of SI module and also remove dead or unused 
> code if exists.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3991) File system could not set modified time because don't override the settime function

2021-03-17 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3991:
-
Fix Version/s: (was: 2.1.1)

> File system could not set modified time because don't override the settime 
> function
> ---
>
> Key: CARBONDATA-3991
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3991
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.0.1
>Reporter: jingpan xiong
>Priority: Major
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> The file system like S3 and Alluxio, don't override the settime function, 
> cause the updata and create mv got some problem. This bug can't raise a 
> exception on set modified time, and may set a null value in modified time. 
> This bug may cause multi tenant problem and data consistency problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-3908) When a carbon segment is added through the alter add segments query, then it is not accounting the added carbon segment values.

2021-03-17 Thread Ajantha Bhat (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303162#comment-17303162
 ] 

Ajantha Bhat commented on CARBONDATA-3908:
--

[https://github.com/apache/carbondata/pull/4001]

 

> When a carbon segment is added through the alter add segments query, then it 
> is not accounting the added carbon segment values.
> ---
>
> Key: CARBONDATA-3908
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3908
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: FI cluster and opensource cluster.
>Reporter: Prasanna Ravichandran
>Priority: Major
> Fix For: 2.1.1
>
>
> When a carbon segment is added through the alter add segments query, then it 
> is not accounting the added carbon segment values. If we do count(*) on the 
> added segment, then it is always showing as 0.
> Test queries:
> drop table if exists uniqdata;
> CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as carbondata;
> load data inpath 'hdfs://hacluster/BabuStore/Data/2000_UniqData.csv' into 
> table uniqdata 
> options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> --hdfs dfs -mkdir /uniqdata-carbon-segment;
> --hdfs dfs -cp /user/hive/warehouse/uniqdata/Fact/Part0/Segment_0/* 
> /uniqdata-carbon-segment/
> Alter table uniqdata add segment options 
> ('path'='hdfs://hacluster/uniqdata-carbon-segment/','format'='carbon');
> select count(*) from uniqdata;--4000 expected as one load of 2000 records 
> happened and same segment is added again;
> set carbon.input.segments.default.uniqdata=1;
> select count(*) from uniqdata;--2000 expected - it should just show the 
> records count of added segments;
> CONSOLE:
> /> set carbon.input.segments.default.uniqdata=1;
> +-++
> | key | value |
> +-++
> | carbon.input.segments.default.uniqdata | 1 |
> +-++
> 1 row selected (0.192 seconds)
> /> select count(*) from uniqdata;
> INFO : Execution ID: 1734
> +---+
> | count(1) |
> +---+
> | 2000 |
> +---+
> 1 row selected (4.036 seconds)
> /> set carbon.input.segments.default.uniqdata=2;
> +-++
> | key | value |
> +-++
> | carbon.input.segments.default.uniqdata | 2 |
> +-++
> 1 row selected (0.088 seconds)
> /> select count(*) from uniqdata;
> INFO : Execution ID: 1745
> +---+
> | count(1) |
> +---+
> | 2000 |
> +---+
> 1 row selected (6.056 seconds)
> /> set carbon.input.segments.default.uniqdata=3;
> +-++
> | key | value |
> +-++
> | carbon.input.segments.default.uniqdata | 3 |
> +-++
> 1 row selected (0.161 seconds)
> /> select count(*) from uniqdata;
> INFO : Execution ID: 1753
> +---+
> | count(1) |
> +---+
> | 0 |
> +---+
> 1 row selected (4.875 seconds)
> /> show segments for table uniqdata;
> +-+--+--+--+++-+--+
> | ID | Status | Load Start Time | Load Time Taken | Partition | Data Size | 
> Index Size | File Format |
> +-+--+--+--+++-+--+
> | 4 | Success | 2020-07-17 16:01:53.673 | 5.579S | {} | 269.10KB | 7.21KB | 
> columnar_v3 |
> | 3 | Success | 2020-07-17 16:00:24.866 | 0.578S | {} | 88.55KB | 1.81KB | 
> columnar_v3 |
> | 2 | Success | 2020-07-17 15:07:54.273 | 0.642S | {} | 36.72KB | NA | orc |
> | 1 | Success | 2020-07-17 15:03:59.767 | 0.564S | {} | 89.26KB | NA | 
> parquet |
> | 0 | Success | 2020-07-16 12:44:32.095 | 4.484S | {} | 88.55KB | 1.81KB | 
> columnar_v3 |
> +-

[jira] [Updated] (CARBONDATA-3880) How to start JDBC service in distributed index

2021-03-17 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3880:
-
Fix Version/s: (was: 2.1.1)

>  How to start JDBC service in distributed index
> ---
>
> Key: CARBONDATA-3880
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3880
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.0.0
>Reporter: li
>Priority: Major
>
> How to start JDBC service in distributed index



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3880) How to start JDBC service in distributed index

2021-03-17 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-3880.
--
Resolution: Not A Problem

>  How to start JDBC service in distributed index
> ---
>
> Key: CARBONDATA-3880
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3880
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.0.0
>Reporter: li
>Priority: Major
>
> How to start JDBC service in distributed index



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3643) Insert array('')/array() into Struct column will result in array(null), which is inconsist with Parquet

2021-03-17 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3643:
-
Fix Version/s: (was: 2.1.1)
   2.2.0

> Insert array('')/array() into Struct column will result in 
> array(null), which is inconsist with Parquet
> --
>
> Key: CARBONDATA-3643
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3643
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Xingjun Hao
>Priority: Minor
> Fix For: 2.2.0
>
>
>  
> {code:java}
> //
> sql("create table datatype_struct_parquet(price struct>) 
> stored as parquet") 
> sql("insert into table datatype_struct_parquet values(named_struct('b', 
> array('')))") 
> sql("create table datatype_struct_carbondata(price struct>) 
> stored as carbondata") 
> sql("insert into datatype_struct_carbondata select * from 
> datatype_struct_parquet")
> checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * 
> FROM datatype_struct_parquet"))
> !== Correct Answer - 1 == == Spark Answer - 1 == 
> ![[WrappedArray()]] [[WrappedArray(null)]]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4137) Refactor CarbonDataSourceScan without Spark Filter

2021-03-17 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4137:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Refactor CarbonDataSourceScan without Spark Filter
> --
>
> Key: CARBONDATA-4137
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4137
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: David Cai
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4137) Refactor CarbonDataSourceScan without Spark Filter

2021-03-17 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4137:
-
Fix Version/s: (was: 2.1.1)
   2.2.0

> Refactor CarbonDataSourceScan without Spark Filter
> --
>
> Key: CARBONDATA-4137
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4137
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: David Cai
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3816) Support Float and Decimal in the Merge Flow

2021-03-17 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3816:
-
Fix Version/s: (was: 2.1.1)
   2.2.0

> Support Float and Decimal in the Merge Flow
> ---
>
> Key: CARBONDATA-3816
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3816
> Project: CarbonData
>  Issue Type: New Feature
>  Components: data-load
>Affects Versions: 2.0.0
>Reporter: Xingjun Hao
>Priority: Major
> Fix For: 2.2.0
>
>
> We don't support FLOAT and DECIMAL datatype in the CDC Flow. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3615) Show metacache shows the index server index-dictionary files when data loaded after index server disabled using set command

2021-03-17 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3615:
-
Fix Version/s: (was: 2.1.1)
   2.2.0

> Show metacache shows the index server index-dictionary files when data loaded 
> after index server disabled using set command
> ---
>
> Key: CARBONDATA-3615
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3615
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Vikram Ahuja
>Priority: Minor
> Fix For: 2.2.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Show metacache shows the index server index-dictionary files when data loaded 
> after index server disabled using set command
> +-+-+-+-+--+
> |    Field    |  Size   |         Comment         | Cache Location  |
> +-+-+-+-+--+
> | Index       | 0 B     | 0/2 index files cached  | DRIVER          |
> | Dictionary  | 0 B     |                         | DRIVER          |
> *| Index       | 1.5 KB  | 2/2 index files cached  | INDEX SERVER    |*
> *| Dictionary  | 0 B     |                         | INDEX SERVER    |*
> *+-+-+-+*-+--+



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-3615) Show metacache shows the index server index-dictionary files when data loaded after index server disabled using set command

2021-03-17 Thread Ajantha Bhat (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303133#comment-17303133
 ] 

Ajantha Bhat commented on CARBONDATA-3615:
--

[~vikramahuja_]: please check and close issue. if it is already handled

> Show metacache shows the index server index-dictionary files when data loaded 
> after index server disabled using set command
> ---
>
> Key: CARBONDATA-3615
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3615
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Vikram Ahuja
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Show metacache shows the index server index-dictionary files when data loaded 
> after index server disabled using set command
> +-+-+-+-+--+
> |    Field    |  Size   |         Comment         | Cache Location  |
> +-+-+-+-+--+
> | Index       | 0 B     | 0/2 index files cached  | DRIVER          |
> | Dictionary  | 0 B     |                         | DRIVER          |
> *| Index       | 1.5 KB  | 2/2 index files cached  | INDEX SERVER    |*
> *| Dictionary  | 0 B     |                         | INDEX SERVER    |*
> *+-+-+-+*-+--+



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3875) Support show segments include stage

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3875:
-
Fix Version/s: (was: 2.1.1)
   2.1.0

> Support show segments include stage
> ---
>
> Key: CARBONDATA-3875
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3875
> Project: CarbonData
>  Issue Type: New Feature
>  Components: spark-integration
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Xingjun Hao
>Priority: Major
> Fix For: 2.1.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> There is a lack of monitoring of the stage information in the current system, 
> 'Show segments include stage' command shall be supported. which will provide 
> monitoring information, such as createTime, partitioninfo, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3856) Support the LIMIT operator for show segments command

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3856:
-
Fix Version/s: (was: 2.1.1)
   2.2.0

> Support the LIMIT operator for show segments command
> 
>
> Key: CARBONDATA-3856
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3856
> Project: CarbonData
>  Issue Type: New Feature
>  Components: spark-integration
>Affects Versions: 2.0.0
>Reporter: Xingjun Hao
>Priority: Minor
> Fix For: 2.2.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Now, in the 2.0.0 release, CarbonData doesn't support LIMIT operator in the 
> SHOW SEGMENTS command. The time cost is expensive when there are too many 
> segments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4095) Select Query with SI filter fails, when columnDrift is enabled

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4095:
-
Issue Type: Bug  (was: Improvement)

> Select Query with SI filter fails, when columnDrift is enabled
> --
>
> Key: CARBONDATA-4095
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4095
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthu Murugesh
>Assignee: Indhumathi Muthu Murugesh
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> sql({color:#067d17}"drop table if exists maintable"{color})
>  sql({color:#067d17}"create table maintable (a string,b string,c int,d int) 
> STORED AS carbondata "{color})
>  sql({color:#067d17}"insert into maintable values('k','d',2,3)"{color})
>  sql({color:#067d17}"alter table maintable set 
> tblproperties('sort_columns'='c,d','sort_scope'='local_sort')"{color})
>  sql({color:#067d17}"create index indextable on table maintable(b) AS 
> 'carbondata'"{color})
>  sql({color:#067d17}"insert into maintable values('k','x',2,4)"{color})
>  sql({color:#067d17}"select * from maintable where b='x'"{color}).show(false)
>  
>  
>  
>  
> 2020-12-22 18:58:37 ERROR Executor:91 - Exception in task 0.0 in stage 40.0 
> (TID 422)
> java.lang.RuntimeException: Error while resolving filter expression
>  at 
> org.apache.carbondata.core.index.IndexFilter.resolveFilter(IndexFilter.java:283)
>  at 
> org.apache.carbondata.core.index.IndexFilter.getResolver(IndexFilter.java:203)
>  at 
> org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:152)
>  at 
> org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:382)
>  at 
> org.apache.carbondata.core.scan.executor.impl.VectorDetailQueryExecutor.execute(VectorDetailQueryExecutor.java:43)
>  at 
> org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.initialize(VectorizedCarbonRecordReader.java:141)
>  at 
> org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:540)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.scan_nextBatch_0$(Unknown
>  Source)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>  at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$12$$anon$1.hasNext(WholeStageCodegenExec.scala:631)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:109)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolverBasedOnExpressionType(FilterExpressionProcessor.java:190)
>  at 
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.createFilterResolverTree(FilterExpressionProcessor.java:128)
>  at 
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.createFilterResolverTree(FilterExpressionProcessor.java:121)
>  at 
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolverTree(FilterExpressionProcessor.java:77)
&

[jira] [Updated] (CARBONDATA-4003) Improve IUD Concurrency

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4003:
-
Fix Version/s: (was: 2.1.1)
   2.2.0

> Improve IUD Concurrency
> ---
>
> Key: CARBONDATA-4003
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4003
> Project: CarbonData
>  Issue Type: Improvement
>  Components: spark-integration
>Affects Versions: 2.0.1
>Reporter: Kejian Li
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 20h
>  Remaining Estimate: 0h
>
> When some segments' state of the table is INSERT IN PROGRESS, update 
> operation on the table fails.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3617) loadDataUsingGlobalSort should based on SortColumns Instead Of Whole CarbonRow

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-3617.
--
Fix Version/s: (was: 2.1.1)
   2.0.0
   Resolution: Fixed

> loadDataUsingGlobalSort should based on SortColumns Instead Of Whole CarbonRow
> --
>
> Key: CARBONDATA-3617
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3617
> Project: CarbonData
>  Issue Type: Improvement
>  Components: data-load
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Xingjun Hao
>Priority: Minor
> Fix For: 2.0.0
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> During loading Data usesing globalsort, the sortby processing is based the 
> whole carbon row, the overhead of gc is huge when there are many columns. 
> Theoretically, the sortby processing can works well just based on the sort 
> columns, which will brings less time overhead and gc overhead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3603) Feature Change in CarbonData 2.0

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3603:
-
Fix Version/s: (was: 2.1.1)
   2.2.0

> Feature Change in CarbonData 2.0
> 
>
> Key: CARBONDATA-3603
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3603
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jacky Li
>Priority: Major
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3559) Support adding carbon file into CarbonData table

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3559:
-
Fix Version/s: (was: 2.1.1)
   2.2.0

> Support adding carbon file into CarbonData table
> 
>
> Key: CARBONDATA-3559
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3559
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jacky Li
>Assignee: Jacky Li
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Since adding parquet/orc files into CarbonData table are supported now, 
> adding carbon files should be supported as well



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3370) fix missing version of maven-duplicate-finder-plugin

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3370:
-
Fix Version/s: (was: 2.1.1)
   2.2.0

> fix missing version of maven-duplicate-finder-plugin
> 
>
> Key: CARBONDATA-3370
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3370
> Project: CarbonData
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 1.5.3
>Reporter: lamber-ken
>Priority: Critical
> Fix For: 2.2.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> fix missing version of maven-duplicate-finder-plugin in pom file



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-3670) Support compress offheap columnpage directly, avoding a copy of data from offhead to heap when compressed.

2021-03-16 Thread Ajantha Bhat (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303118#comment-17303118
 ] 

Ajantha Bhat commented on CARBONDATA-3670:
--

Already handled in 2.0 from https://github.com/apache/carbondata/pull/3638

> Support compress offheap columnpage directly, avoding a copy of data from 
> offhead to heap when compressed.
> --
>
> Key: CARBONDATA-3670
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3670
> Project: CarbonData
>  Issue Type: Wish
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Xingjun Hao
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When writing data, the columnpages are stored on the offheap,  the pages will 
> be compressed to save storage cost. Now, in the compression processing, the 
> data will be copied from the offheap to the heap before compressed, which 
> leads to heavier GC overhead compared with compress offhead directly.
> To sum up, we support compress offheap columnpage directly, avoding a copy of 
> data from offhead to heap when compressed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3670) Support compress offheap columnpage directly, avoding a copy of data from offhead to heap when compressed.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-3670.
--
Fix Version/s: (was: 2.1.1)
   Resolution: Duplicate

> Support compress offheap columnpage directly, avoding a copy of data from 
> offhead to heap when compressed.
> --
>
> Key: CARBONDATA-3670
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3670
> Project: CarbonData
>  Issue Type: Wish
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Xingjun Hao
>Priority: Minor
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When writing data, the columnpages are stored on the offheap,  the pages will 
> be compressed to save storage cost. Now, in the compression processing, the 
> data will be copied from the offheap to the heap before compressed, which 
> leads to heavier GC overhead compared with compress offhead directly.
> To sum up, we support compress offheap columnpage directly, avoding a copy of 
> data from offhead to heap when compressed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3746) Support column chunk cache creation and basic read/write

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3746:
-
Fix Version/s: (was: 2.1.1)
   2.2.0

> Support column chunk cache creation and basic read/write
> 
>
> Key: CARBONDATA-3746
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3746
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Jacky Li
>Assignee: Jacky Li
>Priority: Major
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3608) Drop 'STORED BY' syntax in create table

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3608:
-
Fix Version/s: (was: 2.1.1)
   2.2.0

> Drop 'STORED BY' syntax in create table
> ---
>
> Key: CARBONDATA-3608
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3608
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Jacky Li
>Priority: Major
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4034) Improve the time-consuming of Horizontal Compaction for update

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4034:
-
Fix Version/s: 2.1.0

> Improve the time-consuming of Horizontal Compaction for update
> --
>
> Key: CARBONDATA-4034
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4034
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Jiayu Shen
>Priority: Minor
> Fix For: 2.1.0
>
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> In the update flow, horizontal compaction will be significantly slower when 
> updating with a lot of segments(or a lot of blocks). There is a case whose 
> costing is as shown in the log.
> {code:java}
> 2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Update Compaction operation started for 
> [ods_oms.oms_wh_outbound_order] 
>  2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Update Compaction operation completed for 
> [ods_oms.oms_wh_outbound_order]. 
>  2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Delete Compaction operation started for 
> [ods_oms.oms_wh_outbound_order] 
>  2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Delete Compaction operation completed for 
> [ods_oms.oms_wh_outbound_order].{code}
> In this PR, we optimize the process between second and third row of the log, 
> by optimizing the method _performDeleteDeltaCompaction_ in horizontal 
> compaction flow.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4057) Support Complex DataType when Save DataFrame

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4057:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Support Complex DataType when Save DataFrame
> 
>
> Key: CARBONDATA-4057
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4057
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Jiayu Shen
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently,once trigger df.mode(overwrite).save, complex datatype isn't 
> supported, which shall be optimized



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4030) Concurrent SI global sort cannot be success

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4030:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Concurrent SI global sort cannot be success
> ---
>
> Key: CARBONDATA-4030
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4030
> Project: CarbonData
>  Issue Type: Bug
>    Reporter: Ajantha Bhat
>    Assignee: Ajantha Bhat
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> when concurrent SI global sort is in progress, one load was removing the 
> table property added by the other load. So, the global sort insert for one 
> load was failing with error that unable to find position id in the projection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4064) TPCDS queries are failing with NOne.get exception when table has SI configured

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4064:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> TPCDS queries are failing with NOne.get exception when table has SI configured
> --
>
> Key: CARBONDATA-4064
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4064
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthu Murugesh
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4029) After delete in the table which has Alter-added SDK segments, then the count(*) is 0.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4029:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> After delete in the table which has Alter-added SDK segments, then the 
> count(*) is 0.
> -
>
> Key: CARBONDATA-4029
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4029
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: 3 node FI cluster
>Reporter: Prasanna Ravichandran
>Priority: Minor
> Fix For: 2.1.1
>
> Attachments: Primitive.rar
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Do delete on a table which has alter added SDK segments. then the count* is 
> 0. Even count* will be 0 even any number of SDK segments are added after it.
> Test queries:
> drop table if exists external_primitive;
> create table external_primitive (id int, name string, rank smallint, salary 
> double, active boolean, dob date, doj timestamp, city string, dept string) 
> stored as carbondata;
> --before executing the below alter add segment-place the attached SDK files 
> in hdfs at /sdkfiles/primitive2 folder;
> alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');select
>  * from external_primitive;
> delete from external_primitive where id =2;select * from external_primitive;
> Console output:
> /> drop table if exists external_primitive;
> +-+
> | Result |
> +-+
> +-+
> No rows selected (1.586 seconds)
> /> create table external_primitive (id int, name string, rank smallint, 
> salary double, active boolean, dob date, doj timestamp, city string, dept 
> string) stored as carbondata;
> +-+
> | Result |
> +-+
> +-+
> No rows selected (0.774 seconds)
> /> alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');select
>  * from external_primitive;
> +-+
> | Result |
> +-+
> +-+
> No rows selected (1.077 seconds)
> INFO : Execution ID: 320
> +-+---+---+--+-+-++++
> | id | name | rank | salary | active | dob | doj | city | dept |
> +-+---+---+--+-+-++++
> | 1 | AAA | 3 | 3444345.66 | true | 1979-12-09 | 2011-02-10 01:00:20.0 | Pune 
> | IT |
> | 2 | BBB | 2 | 543124.66 | false | 1987-02-19 | 2017-01-01 12:00:20.0 | 
> Bangalore | DATA |
> | 3 | CCC | 1 | 787878.888 | false | 1982-05-12 | 2015-12-01 02:20:20.0 | 
> Pune | DATA |
> | 4 | DDD | 1 | 9.24 | true | 1981-04-09 | 2000-01-15 07:00:20.0 | Delhi 
> | MAINS |
> | 5 | EEE | 3 | 545656.99 | true | 1987-12-09 | 2017-11-25 04:00:20.0 | Delhi 
> | IT |
> | 6 | FFF | 2 | 768678.0 | false | 1987-12-20 | 2017-01-10 05:00:20.0 | 
> Bangalore | DATA |
> | 7 | GGG | 3 | 765665.0 | true | 1983-06-12 | 2017-01-01 02:00:20.0 | Pune | 
> IT |
> | 8 | HHH | 2 | 567567.66 | false | 1979-01-12 | 1995-01-01 12:00:20.0 | 
> Bangalore | DATA |
> | 9 | III | 2 | 787878.767 | true | 1985-02-19 | 2005-08-15 01:00:20.0 | Pune 
> | DATA |
> | 10 | JJJ | 3 | 887877.14 | true | 2000-05-19 | 2016-10-10 12:00:20.0 | 
> Bangalore | MAINS |
> | 18 | | 3 | 7.86786786787E9 | true | 1980-10-05 | 1995-10-07 22:00:20.0 | 
> Bangalore | IT |
> | 19 | | 2 | 5464545.33 | true | 1986-06-06 | 2008-08-15 01:00:20.0 | Delhi | 
> DATA |
> | 20 | NULL | 3 | 7867867.34 | true | 2000-05-01 | 2014-01-18 12:00:20.0 | 
> Bangalore | MAINS |
> +-+---+---+--+-+-++++
> 13 rows selected (2.458 seconds)
> /> delete from external_primitive where id =2;select * from 
> external_primitive;
> INFO : Execution ID: 322
> ++
> | Deleted Row Count |
> ++
> | 1 |
> ++
> 1 row selected (3.723 seconds)
> +-+---+---+-+-+--+--+---+---+
> | id | name | rank | salary | active | dob | doj | city | dept |
> +-+---+---+-+-+--+--+---+---+
> +-+---+---+-+-+--+--+---+---+
> No rows selected (1.531 seconds)
> /> alter table external_primitive add segment 
> op

[jira] [Updated] (CARBONDATA-4020) Drop bloom index for single index of table having multiple index drops all indexes

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4020:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Drop bloom index for single index of table having multiple index drops all 
> indexes
> --
>
> Key: CARBONDATA-4020
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4020
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 2.1.0
> Environment: Spark 2.4.5
>Reporter: Chetan Bhat
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Create multiple bloom indexes on the table. Try to drop single bloom index
> drop table if exists datamap_test_1;
>  CREATE TABLE datamap_test_1 (id int,name string,salary float,dob date)STORED 
> as carbondata TBLPROPERTIES('SORT_COLUMNS'='id');
>  
>  CREATE index dm_datamap_test_1_2 ON TABLE datamap_test_1(id) as 
> 'bloomfilter' PROPERTIES ( 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', 
> 'BLOOM_COMPRESS'='true');
>  
>  CREATE index dm_datamap_test3 ON TABLE datamap_test_1 (name) as 
> 'bloomfilter' PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', 
> 'BLOOM_COMPRESS'='true');
> show indexes on table datamap_test_1;
> drop index dm_datamap_test_1_2 on datamap_test_1;
> show indexes on table datamap_test_1;
>  
> Issue : Drop bloom index for single index of table having multiple index 
> drops all indexes
> 0: jdbc:hive2://linux-32:22550/> show indexes on table datamap_test_1;
> +--+--+--++--+
> | Name | Provider | Indexed Columns | Properties | Status | Sync In
> +--+--+--++--+
> | dm_datamap_test_1_2 | bloomfilter | id | 
> 'INDEX_COLUMNS'='id','bloom_compress'='true','bloom_fpp'='0.1','blo
> | dm_datamap_test3 | bloomfilter | name | 
> 'INDEX_COLUMNS'='name','bloom_compress'='true','bloom_fpp'='0.1','b
> +--+--+--++--+
> 2 rows selected (0.315 seconds)
> 0: jdbc:hive2://linux-32:22550/> drop index dm_datamap_test_1_2 on 
> datamap_test_1;
> +-+
> | Result |
> +-+
> +-+
> No rows selected (1.232 seconds)
> 0: jdbc:hive2://linux-32:22550/> show indexes on table datamap_test_1;
> +---+---+--+-+-++
> | Name | Provider | Indexed Columns | Properties | Status | Sync Info |
> +---+---+--+-+-++
> +---+---+--+-+-++
> No rows selected (0.21 seconds)
> 0: jdbc:hive2://linux-32:22550/>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4078) add external segment and query with index server fails

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4078:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> add external segment and query with index server fails
> --
>
> Key: CARBONDATA-4078
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4078
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Minor
> Fix For: 2.1.1
>
> Attachments: is_noncarbonsegments stacktrace
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> index server tries to cache parquet/orc segments and fails as it cannot read 
> the file format when the fallback mode is disabled.
> Ex:  'test parquet table' test case
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4093) Add logs for MV and method to verify if mv is in Sync during query

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4093:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Add logs for MV and method to verify if mv is in Sync during query
> --
>
> Key: CARBONDATA-4093
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4093
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Indhumathi Muthu Murugesh
>Assignee: Indhumathi Muthu Murugesh
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4076) Query having Subquery alias used in query projection doesnot hit mv after creation

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4076:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Query having Subquery alias used in query projection doesnot hit mv after 
> creation
> --
>
> Key: CARBONDATA-4076
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4076
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthu Murugesh
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> {color:#067d17}CREATE TABLE fact_table1 (empname String, designation String, 
> doj Timestamp,
> {color}{color:#067d17}workgroupcategory int, workgroupcategoryname String, 
> deptno int, deptname String,
> {color}{color:#067d17}projectcode int, projectjoindate Timestamp, 
> projectenddate Timestamp,attendance int,
> {color}{color:#067d17}utilization int,salary int)
> {color}{color:#067d17}STORED AS carbondata;{color}
> {color:#067d17}create materialized view mv_sub as select empname, sum(result) 
> sum_ut from (select empname, utilization result from fact_table1) fact_table1 
> group by empname;
> {color}
>  
> {color:#067d17}select empname, sum(result) sum_ut from (select empname, 
> utilization result from fact_table1) fact_table1 group by empname;{color}
>  
> {color:#067d17}explain select empname, sum(result) sum_ut from (select 
> empname, utilization result from fact_table1) fact_table1 group by 
> empname;{color}
>  
> {color:#067d17}Expected: Query should hit MV{color}
> {color:#067d17}Actual: Query is not hitting MV{color}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4092) Insert command fails with concurrent delete segment operation

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4092:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Insert command fails with concurrent delete segment operation
> -
>
> Key: CARBONDATA-4092
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4092
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Vikram Ahuja
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3987) Issues in SDK Pagination reader (2 issues)

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3987:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Issues in SDK Pagination reader (2 issues)
> --
>
> Key: CARBONDATA-3987
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3987
> Project: CarbonData
>  Issue Type: Bug
>  Components: other
>Affects Versions: 2.1.0
>Reporter: Chetan Bhat
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Issue 1 : 
> write data to table and insert into one more row , error is thrown when try 
> to read new added row where as getTotalRows get incremented by 1.
> Test code-
> /**
>  * Carbon Files are written using CarbonWriter in outputpath
>  *
>  * Carbon Files are read using paginationCarbonReader object
>  * Checking pagination with insert on large data with 8 split
>  */
>  @Test
>  public void testSDKPaginationInsertData() throws IOException, 
> InvalidLoadOptionException, InterruptedException {
>  System.out.println("___" + 
> name.getMethodName() + " TestCase Execution is 
> started");
> //
> // String outputPath1 = getOutputPath(outputDir, name.getMethodName() + 
> "large");
> //
> // long uid = 123456;
> // TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"));
> // writeMultipleCarbonFiles("id int,name string,rank short,salary 
> double,active boolean,dob date,doj timestamp,city string,dept string", 
> getDatas(), outputPath1, uid, null, null);
> //
> // System.out.println("Data is written");
> List data1 = new ArrayList();
>  String[] row1 = \{"1", "AAA", "3", "3444345.66", "true", "1979-12-09", 
> "2011-2-10 1:00:20", "Pune", "IT"};
>  String[] row2 = \{"2", "BBB", "2", "543124.66", "false", "1987-2-19", 
> "2017-1-1 12:00:20", "Bangalore", "DATA"};
>  String[] row3 = \{"3", "CCC", "1", "787878.888", "false", "1982-05-12", 
> "2015-12-1 2:20:20", "Pune", "DATA"};
>  String[] row4 = \{"4", "DDD", "1", "9.24", "true", "1981-04-09", 
> "2000-1-15 7:00:20", "Delhi", "MAINS"};
>  String[] row5 = \{"5", "EEE", "3", "545656.99", "true", "1987-12-09", 
> "2017-11-25 04:00:20", "Delhi", "IT"};
> data1.add(row1);
>  data1.add(row2);
>  data1.add(row3);
>  data1.add(row4);
>  data1.add(row5);
> String outputPath1 = getOutputPath(outputDir, name.getMethodName() + "large");
> long uid = 123456;
>  TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"));
>  writeMultipleCarbonFiles("id int,name string,rank short,salary double,active 
> boolean,dob date,doj timestamp,city string,dept string", data1, outputPath1, 
> uid, null, null);
> System.out.println("Data is written");
> String hdfsPath1 = moveFiles(outputPath1, outputPath1);
>  String datapath1 = hdfsPath1.concat("/" + name.getMethodName() + "large");
>  System.out.println("HDFS Data Path is: " + datapath1);
> runSQL("create table " + name.getMethodName() + "large" + " using carbon 
> location '" + datapath1 + "'");
>  System.out.println("Table " + name.getMethodName() + " is created 
> Successfully");
>  runSQL("select count(*) from " + name.getMethodName() + "large");
>  long uid1 = 123;
>  String outputPath = getOutputPath(outputDir, name.getMethodName());
>  List data = new ArrayList();
>  String[] row = \{"222", "Daisy", "3", "334.456", "true", "1956-11-08", 
> "2013-12-10 12:00:20", "Pune", "IT"};
>  data.add(row);
>  writeData("id int,name string,rank short,salary double,active boolean,dob 
> date,doj timestamp,city string,dept string", data, outputPath, uid, null, 
> null);
>  String hdfsPath = moveFiles(outputPath, outputPath);
>  String datapath = hdfsPath.concat("/" + name.getMethodName());
> run

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4051:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Geo spatial index algorithm improvement and UDFs enhancement
> 
>
> Key: CARBONDATA-4051
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4051
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Jiayu Shen
>Priority: Minor
> Fix For: 2.1.1
>
> Attachments: CarbonData Spatial Index Design Doc v2.docx
>
>  Time Spent: 21h 10m
>  Remaining Estimate: 0h
>
> The requirement is from SEQ,related algorithms are provided by Discovery Team.
> 1. Replace geohash encoded algorithm, and reduce required properties of 
> CREATE TABLE. For example,
> {code:java}
> CREATE TABLE geoTable(
>  timevalue BIGINT,
>  longitude LONG,
>  latitude LONG) COMMENT "This is a GeoTable"
>  STORED AS carbondata
>  TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
>  'SPATIAL_INDEX.mygeohash.type'='geohash',
>  'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
>  'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
>  'SPATIAL_INDEX.mygeohash.gridSize'='50',
>  'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
> 2. Add geo query UDFs
> query filter UDFs :
>  * _*InPolygonList (List polygonList, OperationType opType)*_
>  * _*InPolylineList (List polylineList, Float bufferInMeter)*_
>  * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_
> *operation only support :*
>  * *"OR", means calculating union of two polygons*
>  * *"AND", means calculating intersection of two polygons*
> geo util UDFs :
>  * _*GeoIdToGridXy(Long geoId) :* *Pair*_
>  * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_
>  * _*GeoIdToLatLng(Long geoId) : Pair*_
>  * _*ToUpperLayerGeoId(Long geoId) : Long*_
>  * _*ToRangeList (String polygon) : List*_
> 3. Currently GeoID is a column created internally for spatial tables, this PR 
> will support GeoID column to be customized during LOAD/INSERT INTO. For 
> example, 
> {code:java}
> INSERT INTO geoTable SELECT 0,157542840,116285807,40084087;
> It uesed to be as below, '855280799612' is generated internally,
> ++-+-++
> |mygeohash  |timevalue   |longitude|latitude|
> ++-+-++
> |855280799612|157542840|116285807|40084087|
> ++-+-++
> but now is
> ++-+-++
> |mygeohash  |timevalue  |longitude|latitude|
> ++-+-++
> |0   |157542840|116285807|40084087|
> ++-+-++{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4111) Filter query having invalid results after add segment to table having SI with Indexserver

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4111:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Filter query having invalid results after add segment to table having SI with 
> Indexserver
> -
>
> Key: CARBONDATA-4111
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4111
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Minor
> Fix For: 2.1.1
>
> Attachments: addseg_si_is.png
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> queries to execute:
> create table maintable_sdk(a string, b int, c string) stored as carbondata;
>  insert into maintable_sdk select 'k',1,'k';
>  insert into maintable_sdk select 'l',2,'l';
>  CREATE INDEX maintable_si_sdk on table maintable_sdk (c) as 'carbondata';
>  alter table maintable_sdk add segment 
> options('path'='hdfs://hacluster/sdkfiles/newsegment/', 'format'='carbon');
> spark-sql> select *from maintable_sdk where c='m';
> 2021-01-27 12:10:54,326 | WARN | IPC Client (653337757) connection to 
> linux-30/10.19.90.30:22900 from car...@hadoop.com | Unexpected error reading 
> responses on connection Thread[IPC Client (653337757) connection to 
> linux-30/10.19.90.30:22900 from car...@hadoop.com,5,main] | 
> org.apache.hadoop.ipc.Client.run(Client.java:1113)
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.carbondata.core.indexstore.SegmentWrapperContainer.()
>  at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:135)
>  at 
> org.apache.hadoop.io.WritableFactories.newInstance(WritableFactories.java:58)
>  at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:284)
>  at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:77)
>  at 
> org.apache.hadoop.ipc.RpcWritable$WritableWrapper.readFrom(RpcWritable.java:85)
>  at org.apache.hadoop.ipc.RpcWritable$Buffer.getValue(RpcWritable.java:187)
>  at org.apache.hadoop.ipc.RpcWritable$Buffer.newInstance(RpcWritable.java:183)
>  at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1223)
>  at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1107)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.carbondata.core.indexstore.SegmentWrapperContainer.()
>  at java.lang.Class.getConstructor0(Class.java:3082)
>  at java.lang.Class.getDeclaredConstructor(Class.java:2178)
>  at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129)
>  ... 8 more
> 2021-01-27 12:10:54,330 | WARN | main | Distributed Segment Pruning failed, 
> initiating embedded pruning | 
> org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin$.getFilteredSegments(BroadCastSIFilterPushJoin.scala:349)
> java.lang.reflect.UndeclaredThrowableException
>  at com.sun.proxy.$Proxy59.getPrunedSegments(Unknown Source)
>  at 
> org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin$.getFilteredSegments(BroadCastSIFilterPushJoin.scala:341)
>  at 
> org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin$.getFilteredSegments(BroadCastSIFilterPushJoin.scala:426)
>  at 
> org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin.partitions$lzycompute(BroadCastSIFilterPushJoin.scala:80)
>  at 
> org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin.partitions(BroadCastSIFilterPushJoin.scala:78)
>  at 
> org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin.inputCopy$lzycompute(BroadCastSIFilterPushJoin.scala:94)
>  at 
> org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin.inputCopy(BroadCastSIFilterPushJoin.scala:93)
>  at 
> org.apache.spark.sql.secondaryindex.joins.BroadCastSIFilterPushJoin.doExecute(BroadCastSIFilterPushJoin.scala:132)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:177)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:201)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:198)
>  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:173)
>  at 
> org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:293)
>  at 
> org.apa

[jira] [Updated] (CARBONDATA-4094) Select count(*) on partition table fails in index server fallback mode

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4094:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Select count(*) on partition table fails in index server fallback mode
> --
>
> Key: CARBONDATA-4094
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4094
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Vikram Ahuja
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4075) Should refactor to use withEvents instead of fireEvent

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4075:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Should refactor to use withEvents instead of fireEvent
> --
>
> Key: CARBONDATA-4075
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4075
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: David Cai
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4124) Refresh MV which does not exist is not throwing proper message

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4124:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Refresh MV which does not exist is not throwing proper message
> --
>
> Key: CARBONDATA-4124
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4124
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthu Murugesh
>Assignee: Indhumathi Muthu Murugesh
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4110) Support clean files dry run and show statistics after clean files operation

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4110:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Support clean files dry run and show statistics after clean files operation
> ---
>
> Key: CARBONDATA-4110
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4110
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Vikram Ahuja
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 26h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4053) Alter table rename column failed

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4053:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Alter table rename column failed
> 
>
> Key: CARBONDATA-4053
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4053
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.1.0
>Reporter: Yahui Liu
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: 截图.PNG
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Alter table rename column failed because incorrectly replace the content in 
> tblproperties by new column name, which the content is not related to column 
> name.
>   !截图.PNG!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4052) Select query on SI table after insert overwrite is giving wrong result.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4052:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Select query on SI table after insert overwrite is giving wrong result.
> ---
>
> Key: CARBONDATA-4052
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4052
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> # Create carbon table.
>  # Create SI table on the same carbon table.
>  # Do load or insert operation.
>  # Run query insert overwrite on maintable.
>  # Now select query on SI table is showing old as well as new data which 
> should be only new data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4066) data mismatch observed with SI and without SI when SI global sort and SI segment merge is true

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4066:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> data mismatch observed with SI and without SI when SI global sort and SI 
> segment merge is true
> --
>
> Key: CARBONDATA-4066
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4066
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Mahesh Raju Somalaraju
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> data mismatch observed with SI and without SI when SI global sort and SI 
> segment merge is true
>  
> test case for reproduce the issue:
> CarbonProperties.getInstance()
>  .addProperty(CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE, "true")
> sql("create table complextable2 (id int, name string, country array) 
> stored as " +
>  "carbondata tblproperties('sort_scope'='global_sort','sort_columns'='name')")
> sql(
>  s"load data inpath '$resourcesPath/secindex/array.csv' into table 
> complextable2 options('delimiter'=','," +
>  
> "'quotechar'='\"','fileheader'='id,name,country','complex_delimiter_level_1'='$',"
>  +
>  "'global_sort_partitions'='10')")
> val result = sql(" select * from complextable2 where 
> array_contains(country,'china')")
> sql("create index index_2 on table complextable2(country) as 'carbondata' 
> properties" +
>  "('sort_scope'='global_sort')")
> checkAnswer(sql("select count(*) from complextable2 where 
> array_contains(country,'china')"),
>  sql("select count(*) from complextable2 where 
> ni(array_contains(country,'china'))"))



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4056) Adding global sort support for SI segments data files merge operation.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4056:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Adding global sort support for SI segments data files merge operation.
> --
>
> Key: CARBONDATA-4056
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4056
> Project: CarbonData
>  Issue Type: New Feature
>  Components: other
>Affects Versions: 2.0.0
>Reporter: Karan
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Enabling carbon property (carbon.si.segment.merge) helps to reduce number of 
> carbondata files in the SI segments. When SI is created with sort scope as 
> global sort and this carbon property is enabled, then the data in SI segments 
> must be globally sorted after data files are merged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4054) Size control of minor compaction

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4054:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Size control of minor compaction
> 
>
> Key: CARBONDATA-4054
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4054
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: ZHANGSHUNYU
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> {{Currentlly, minor compaction only consider the num of segments and major}}
> compaction only consider the SUM size of segments, but consider a scenario
>  that the user want to use minor compaction by the num of segments but he
>  dont want to merge the segment whose datasize larger the threshold for
>  example 2GB, as it is no need to merge so much big segment and it is time
>  costly.
>  so we need to add a parameter to control the threshold of segment included
>  in minor compaction, so that the user can specify the segment not included
>  in minor compaction once the datasize exeed the threshold, of course default
>  value must be threre.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4067) Change clean files behaviour to support cleaning of in progress segments

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4067:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Change clean files behaviour to support cleaning of in progress segments
> 
>
> Key: CARBONDATA-4067
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4067
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vikram Ahuja
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> Change clean files behaviour to support cleaning of in progress segments



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4062) Should make clean files become data trash manager

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4062:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Should make clean files become data trash manager
> -
>
> Key: CARBONDATA-4062
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4062
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: David Cai
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 26h 10m
>  Remaining Estimate: 0h
>
> To prevent accidental deletion of data, carbon will introduce data trash 
> management. It will provide buffer time for accidental deletion of data to 
> roll back the delete operation.
> Data trash management is a part of carbon data lifecycle management. Clean 
> files as a data trash manager should contain the following two parts.
> part 1: manage metadata-indexed data trash.
>   This data is at the original place of the table and indexed by metadata. 
> carbon manages this data by metadata index and should avoid using listFile() 
> interface.
> part 2: manage ".Trash" folder.
>    Now ".Trash" folder is without metadata index, and the operation on it 
> bases on timestamp and listFile() interface. In the future, carbon will index 
> ".Trash" folder to improve data trash management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3908) When a carbon segment is added through the alter add segments query, then it is not accounting the added carbon segment values.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3908:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> When a carbon segment is added through the alter add segments query, then it 
> is not accounting the added carbon segment values.
> ---
>
> Key: CARBONDATA-3908
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3908
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: FI cluster and opensource cluster.
>Reporter: Prasanna Ravichandran
>Priority: Major
> Fix For: 2.1.1
>
>
> When a carbon segment is added through the alter add segments query, then it 
> is not accounting the added carbon segment values. If we do count(*) on the 
> added segment, then it is always showing as 0.
> Test queries:
> drop table if exists uniqdata;
> CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as carbondata;
> load data inpath 'hdfs://hacluster/BabuStore/Data/2000_UniqData.csv' into 
> table uniqdata 
> options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> --hdfs dfs -mkdir /uniqdata-carbon-segment;
> --hdfs dfs -cp /user/hive/warehouse/uniqdata/Fact/Part0/Segment_0/* 
> /uniqdata-carbon-segment/
> Alter table uniqdata add segment options 
> ('path'='hdfs://hacluster/uniqdata-carbon-segment/','format'='carbon');
> select count(*) from uniqdata;--4000 expected as one load of 2000 records 
> happened and same segment is added again;
> set carbon.input.segments.default.uniqdata=1;
> select count(*) from uniqdata;--2000 expected - it should just show the 
> records count of added segments;
> CONSOLE:
> /> set carbon.input.segments.default.uniqdata=1;
> +-++
> | key | value |
> +-++
> | carbon.input.segments.default.uniqdata | 1 |
> +-++
> 1 row selected (0.192 seconds)
> /> select count(*) from uniqdata;
> INFO : Execution ID: 1734
> +---+
> | count(1) |
> +---+
> | 2000 |
> +---+
> 1 row selected (4.036 seconds)
> /> set carbon.input.segments.default.uniqdata=2;
> +-++
> | key | value |
> +-++
> | carbon.input.segments.default.uniqdata | 2 |
> +-++
> 1 row selected (0.088 seconds)
> /> select count(*) from uniqdata;
> INFO : Execution ID: 1745
> +---+
> | count(1) |
> +---+
> | 2000 |
> +---+
> 1 row selected (6.056 seconds)
> /> set carbon.input.segments.default.uniqdata=3;
> +-++
> | key | value |
> +-++
> | carbon.input.segments.default.uniqdata | 3 |
> +-++
> 1 row selected (0.161 seconds)
> /> select count(*) from uniqdata;
> INFO : Execution ID: 1753
> +---+
> | count(1) |
> +---+
> | 0 |
> +---+
> 1 row selected (4.875 seconds)
> /> show segments for table uniqdata;
> +-+--+--+--+++-+--+
> | ID | Status | Load Start Time | Load Time Taken | Partition | Data Size | 
> Index Size | File Format |
> +-+--+--+--+++-+--+
> | 4 | Success | 2020-07-17 16:01:53.673 | 5.579S | {} | 269.10KB | 7.21KB | 
> columnar_v3 |
> | 3 | Success | 2020-07-17 16:00:24.866 | 0.578S | {} | 88.55KB | 1.81KB | 
> columnar_v3 |
> | 2 | Success | 2020-07-17 15:07:54.273 | 0.642S | {} | 36.72KB | NA | orc |
> | 1 | Success | 2020-07-17 15:03:59.767 | 0.564S | {} | 89.26KB | NA | 
> parquet |
> | 0 | Success | 2020-07-16 12:44:32.095 | 4.484S | {} | 88.55KB | 1.81KB | 
> columnar_v3 |
> +-+--+--+--+++-+--+
> Expected result: Records added by adding carbon segment should be considered.
> Actual result: Records added by adding carbon segment is not considered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4068) Alter table set long string should not allowed on SI column.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4068:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Alter table set long string should not allowed on SI column.
> 
>
> Key: CARBONDATA-4068
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4068
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> # Create table and create SI.
>  # Now try to set the column data type to long string on which SI is created.
> Operation should not be allowed because we don't support SI on long string.
> create table maintable (a string,b string,c int) STORED AS carbondata;
> create index indextable on table maintable(b) AS 'carbondata';
> insert into maintable values('k','x',2);
> ALTER TABLE maintable SET TBLPROPERTIES('long_String_columns'='b');



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4069) Alter table set streaming=true should not be allowed on SI table or table having SI.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4069:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Alter table set streaming=true should not be allowed on SI table or table 
> having SI.
> 
>
> Key: CARBONDATA-4069
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4069
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> # Create carbon table and SI .
>  # Now set streaming = true on either SI table or main table.
> Both the operation should not be allowed because SI is not supported on 
> streaming table.
>  
> create table maintable2 (a string,b string,c int) STORED AS carbondata;
> insert into maintable2 values('k','x',2);
> create index m_indextable on table maintable2(b) AS 'carbondata';
> ALTER TABLE maintable2 SET TBLPROPERTIES('streaming'='true');  => operation 
> should not be allowed.
> ALTER TABLE m_indextable SET TBLPROPERTIES('streaming'='true') => operation 
> should not be allowed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4040) Data mismatch incase of compaction failure and retry success

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4040:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Data mismatch incase of compaction failure and retry success
> 
>
> Key: CARBONDATA-4040
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4040
> Project: CarbonData
>  Issue Type: Bug
>    Reporter: Ajantha Bhat
>    Assignee: Ajantha Bhat
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> For compaction we don't register inprogress segment. so, when unable to get 
> table status lock. compaction can fail. That time compaction partial segment 
> need to be cleaned. If the partial segment is failed to cleanup due to unable 
> to get lock or IO issues. When the user retries the compaction. carbon uses 
> same segment id. so while writing the segment file for new compaction. list 
> only the files mapping to the current compaction, not all the files which 
> contains stale files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4081) Clean files considering files apart from .segment files while cleaning stale segments and moving them to trash

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4081:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Clean files considering files apart from .segment files while cleaning stale 
> segments and moving them to trash
> --
>
> Key: CARBONDATA-4081
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4081
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Vikram Ahuja
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4087) Issue with huge data(exceeding 32K records) after enabling local dictionary

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4087:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Issue with huge data(exceeding 32K records) after enabling local dictionary
> ---
>
> Key: CARBONDATA-4087
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4087
> Project: CarbonData
>  Issue Type: Bug
>  Components: core, presto-integration
>Reporter: Akshay
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> For large data SELECT on array(varchar) throws exception-
> "Error in Reading Data from Carbondata" due to ArrayOutOfBounds
>  
> https://github.com/apache/carbondata/pull/4055



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4072) Clean files command is not deleting .segment files present at metadata/segments/xxxxx.segment for the segments added through alter table add segment query.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4072:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Clean files command is not deleting .segment files present at 
> metadata/segments/x.segment for the segments added through alter table 
> add segment query.
> ---
>
> Key: CARBONDATA-4072
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4072
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0
>Reporter: Karan
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Clean files command is not deleting .segment files present at 
> metadata/segments/x.segment for the segments added through alter table 
> add segment query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4071) If date or timestamp columns are present as child of complex columns, then its giving wrong results on reading through SDK.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4071:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> If date or timestamp columns are present as child of complex columns, then 
> its giving wrong results on reading through SDK.
> ---
>
> Key: CARBONDATA-4071
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4071
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0
>Reporter: Karan
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> If a date or timestamp column is present as child of complex column and on 
> reading its value through SDK gives wrong results. For eg: Array



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4084) Error when loading string field with high cardinary

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4084:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Error when loading string field with high cardinary 
> 
>
> Key: CARBONDATA-4084
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4084
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 2.0.0, 2.1.0, 2.0.1
>Reporter: Nguyen Dinh Huynh
>Priority: Major
>  Labels: patch
> Fix For: 2.1.1
>
> Attachments: image-2020-12-14-22-40-45-539.png, 
> image_2020_12_13T09_29_38_891Z.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When i am try load string field with more than 1M distinct value, some rows 
> show strange value.
>   !image_2020_12_13T09_29_38_891Z.png!
> I'm trying with this setting: carbon.local.dictionary.enable=false then it 
> works as expect. So seems like have some bugs on decoder fallback.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4099) Fix Concurrent issues with clean files post event listener

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4099:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Fix Concurrent issues with clean files post event listener
> --
>
> Key: CARBONDATA-4099
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4099
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Vikram Ahuja
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> There were 2 issues in the clean files post event listener:
>  # In concurrent cases, while writing entry back to the table status file, 
> wrong path was given, due to which table status file was not updated in the 
> case of SI table.
>  # While writing the loadmetadetails to the table status file during 
> concurrent scenarios, we were only writing the unwanted segments and not all 
> the segments, which could make segments stale in the SI table
> Due to these 2 issues, when selet query is executed on SI table, the 
> tablestatus would have entry for a segment but it's carbondata file would be 
> deleted, thus throwing an IO Exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4095) Select Query with SI filter fails, when columnDrift is enabled

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4095:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Select Query with SI filter fails, when columnDrift is enabled
> --
>
> Key: CARBONDATA-4095
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4095
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Indhumathi Muthu Murugesh
>Assignee: Indhumathi Muthu Murugesh
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> sql({color:#067d17}"drop table if exists maintable"{color})
>  sql({color:#067d17}"create table maintable (a string,b string,c int,d int) 
> STORED AS carbondata "{color})
>  sql({color:#067d17}"insert into maintable values('k','d',2,3)"{color})
>  sql({color:#067d17}"alter table maintable set 
> tblproperties('sort_columns'='c,d','sort_scope'='local_sort')"{color})
>  sql({color:#067d17}"create index indextable on table maintable(b) AS 
> 'carbondata'"{color})
>  sql({color:#067d17}"insert into maintable values('k','x',2,4)"{color})
>  sql({color:#067d17}"select * from maintable where b='x'"{color}).show(false)
>  
>  
>  
>  
> 2020-12-22 18:58:37 ERROR Executor:91 - Exception in task 0.0 in stage 40.0 
> (TID 422)
> java.lang.RuntimeException: Error while resolving filter expression
>  at 
> org.apache.carbondata.core.index.IndexFilter.resolveFilter(IndexFilter.java:283)
>  at 
> org.apache.carbondata.core.index.IndexFilter.getResolver(IndexFilter.java:203)
>  at 
> org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:152)
>  at 
> org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:382)
>  at 
> org.apache.carbondata.core.scan.executor.impl.VectorDetailQueryExecutor.execute(VectorDetailQueryExecutor.java:43)
>  at 
> org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.initialize(VectorizedCarbonRecordReader.java:141)
>  at 
> org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:540)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.scan_nextBatch_0$(Unknown
>  Source)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>  at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$12$$anon$1.hasNext(WholeStageCodegenExec.scala:631)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:109)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolverBasedOnExpressionType(FilterExpressionProcessor.java:190)
>  at 
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.createFilterResolverTree(FilterExpressionProcessor.java:128)
>  at 
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.createFilterResolverTree(FilterExpressionProcessor.java:121)
>  at 
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolverTree(FilterExpress

[jira] [Updated] (CARBONDATA-4073) Added FT for missing scenarios in Presto

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4073:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Added FT for missing scenarios in Presto
> 
>
> Key: CARBONDATA-4073
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4073
> Project: CarbonData
>  Issue Type: Test
>  Components: presto-integration
>Reporter: Akshay
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> FT for following has been added.
> update without local-dict
>  delete operation
>  minor, major, custom compaction
>  add and delete segments
>  test update with inverted index 
>  read with partition columns
>  Filter on partition columns
>  Bloom index
>  test range columns
>  read streaming data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4100) SI loads are in inconsistent state with maintable after concurrent(Load) operation

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4100:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> SI loads are in inconsistent state with maintable after 
> concurrent(Load) operation
> -
>
> Key: CARBONDATA-4100
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4100
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthu Murugesh
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4070) Handle the scenario mentioned in description for SI.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4070:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Handle the scenario mentioned in description for SI.
> 
>
> Key: CARBONDATA-4070
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4070
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> # SI creation should not be allowed on SI table.
>  # SI table should not be scanned with like filter on MT.
>  # Drop column should not be allowed on SI table.
> Add the FT for all above scenario and sort column related scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4059) Block compaction on SI table.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4059:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Block compaction on SI table.
> -
>
> Key: CARBONDATA-4059
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4059
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently compaction is allowed on SI table. Because of this if only SI table 
> is compacted then running filter query query on main table is causing more 
> data scan of SI table which will causing performance degradation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4104) Vector filling for Primitive decimal type needs to be handled

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4104:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Vector filling for Primitive decimal type needs to be handled
> -
>
> Key: CARBONDATA-4104
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4104
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Reporter: Akshay
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Filling of vectors in case of complex primitive decimal type whose precision 
> is greater than 18 is not handled properly.
> for ex 
> array



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4112) Data mismatch issue in SI

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4112:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Data mismatch issue in SI
> -
>
> Key: CARBONDATA-4112
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4112
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0
>Reporter: Karan
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> When data files of a SI segment are merged. It gives more number of rows in 
> SI table than main table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4109) Improve carbondata coverage for presto-integration code

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4109:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Improve carbondata coverage for presto-integration code
> ---
>
> Key: CARBONDATA-4109
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4109
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core, presto-integration
>Reporter: Akshay
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Few scenarios had missing coverage in presto-integration code. This PR aims 
> to improve it by considering all such scenarios.
> Dead code- ObjectStreamReader.java was created with an aim to query complex 
> types. Instead ComplexTypeStreamReader was created. Making ObjectStreamreader 
> obsolete.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4125) SI compatability issue fix

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4125:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> SI compatability issue fix
> --
>
> Key: CARBONDATA-4125
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4125
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthu Murugesh
>Assignee: Indhumathi Muthu Murugesh
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Refer 
> [http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Bug-SI-Compatibility-Issue-td105485.html]
>  for this issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4107) MV Performance and Lock issues

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4107:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> MV Performance and Lock issues
> --
>
> Key: CARBONDATA-4107
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4107
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthu Murugesh
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> # After MV support multi-tenancy PR, mv system folder is moved to database 
> level. Hence, during each operation, insert/Load/IUD/show mv/query, we are 
> listing all the databases in the system and collecting mv schemas and 
> checking if there is any mv mapped to the table or not. This will degrade 
> performance of the query, to collect mv schemas from all databases, even 
> though the table has mv or not.
>  # When different jvm process call touchMDTFile method, file creation and 
> deletion can happen same time. This may fail the operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4102) Add UT and FT to improve coverage of SI module.

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4102:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Add UT and FT to improve coverage of SI module.
> ---
>
> Key: CARBONDATA-4102
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4102
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> Add UT and FT to improve coverage of SI module and also remove dead or unused 
> code if exists.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4122) Support Writing Flink Stage data into Hdfs file system

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4122:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Support Writing Flink Stage data into Hdfs file system
> --
>
> Key: CARBONDATA-4122
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4122
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Indhumathi Muthu Murugesh
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3917) The rows of data loading is not accurate, more rows has been loaded

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-3917:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> The rows of data loading is not accurate, more rows has been loaded
> ---
>
> Key: CARBONDATA-3917
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3917
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 2.0.0
>Reporter: Taoli
>Priority: Blocker
> Fix For: 2.1.1
>
>
> 2020-07-18 18:46:23,856 | INFO | [Executor task launch worker for task 28380] 
> | Total rows processed in step Data Writer: 1277745 | 
> org.apache.carbondata.processing.loading.AbstractDataLoadProcessorStep.close(AbstractDataLoadProcessorStep.java:138)
> 2020-07-18 18:46:23,857 | INFO | [Executor task launch worker for task 28380] 
> | Total rows processed in step Sort Processor: 1189959 | 
> org.apache.carbondata.processing.loading.AbstractDataLoadProcessorStep.close(AbstractDataLoadProcessorStep.java:138)
> 2020-07-18 18:46:23,856 | DEBUG | 
> [LocalFolderDeletionPool:detail_cdr_s1mme_18461_1595087183856] | 
> PrivilegedAction as:omm (auth:SIMPLE) 
> from:org.apache.carbondata.core.util.CarbonUtil.deleteFoldersAndFiles(CarbonUtil.java:298)
>  | 
> org.apache.hadoop.security.UserGroupInformation.logPrivilegedAction(UserGroupInformation.java:1756)
> 2020-07-18 18:46:23,857 | INFO | [Executor task launch worker for task 28380] 
> | Total rows processed in step Data Converter: 1189959 | 
> org.apache.carbondata.processing.loading.AbstractDataLoadProcessorStep.close(AbstractDataLoadProcessorStep.java:138)
> 2020-07-18 18:46:23,857 | INFO | [Executor task launch worker for task 28380] 
> | Total rows processed in step Input Processor: 1189959 | 
> org.apache.carbondata.processing.loading.AbstractDataLoadProcessorStep.close(AbstractDataLoadProcessorStep.java:138)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4115) Return and show segment ID after successful load and insert, including patitioned table and normal table .

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4115:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Return and show segment ID after successful load and insert, including 
> patitioned table and normal table .
> --
>
> Key: CARBONDATA-4115
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4115
> Project: CarbonData
>  Issue Type: Improvement
>  Components: spark-integration
>Reporter: lihongzhao
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> Return and show segment ID after successful load and insert, including 
> patitioned table and normal table .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4089) Create table with location, if the location didn't have scheme, the default will be local file system, which is not the file system defined by defaultFS

2021-03-16 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4089:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Create table with location, if the location didn't have scheme, the default 
> will be local file system, which is not the file system defined by defaultFS
> 
>
> Key: CARBONDATA-4089
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4089
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.1.0
>Reporter: Yahui Liu
>Priority: Blocker
> Fix For: 2.1.1
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> If the location didn't specify scheme, should use the file system defined by 
> defaultFS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4133) Concurrent Insert Overwrite with static partition on Index server fails

2021-03-15 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-4133:
-
Fix Version/s: (was: 2.2.0)
   2.1.1

> Concurrent Insert Overwrite with static partition on Index server fails
> ---
>
> Key: CARBONDATA-4133
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4133
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> [Steps] :-
> with Index Server running execute the concurrent insert overwrite with static 
> partition. 
>  
> Set 0:
> CREATE TABLE if not exists uniqdata_string(CUST_ID int,CUST_NAME String,DOB 
> timestamp,DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10),DECIMAL_COLUMN2 
> decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
> int) PARTITIONED BY(ACTIVE_EMUI_VERSION string) STORED AS carbondata 
> TBLPROPERTIES ('TABLE_BLOCKSIZE'= '256 MB');
> Set 1:
> LOAD DATA INPATH 'hdfs://hacluster/BabuStore/Data/2000_UniqData.csv' into 
> table uniqdata_string partition(active_emui_version='abc') 
> OPTIONS('FILEHEADER'='CUST_ID,CUST_NAME ,ACTIVE_EMUI_VERSION,DOB,DOJ, 
> BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1, 
> Double_COLUMN2,INTEGER_COLUMN1','BAD_RECORDS_ACTION'='FORCE');
> LOAD DATA INPATH 'hdfs://hacluster/datasets/2000_UniqData.csv' into table 
> uniqdata_string partition(active_emui_version='abc') 
> OPTIONS('FILEHEADER'='CUST_ID,CUST_NAME ,ACTIVE_EMUI_VERSION,DOB,DOJ, 
> BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1, 
> Double_COLUMN2,INTEGER_COLUMN1','BAD_RECORDS_ACTION'='FORCE');
> Set 2:
> CREATE TABLE if not exists uniqdata_hive (CUST_ID int,CUST_NAME 
> String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, 
> BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), 
> DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double, 
> INTEGER_COLUMN1 int)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> load data local inpath "/opt/csv/2000_UniqData.csv" into table uniqdata_hive;
> Set 3: (concurrent)
> insert overwrite table uniqdata_string partition(active_emui_version='abc') 
> select CUST_ID, CUST_NAME,DOB,doj, bigint_column1, bigint_column2, 
> decimal_column1, decimal_column2,double_column1, 
> double_column2,integer_column1 from uniqdata_hive limit 10;
> insert overwrite table uniqdata_string partition(active_emui_version='abc') 
> select CUST_ID, CUST_NAME,DOB,doj, bigint_column1, bigint_column2, 
> decimal_column1, decimal_column2,double_column1, 
> double_column2,integer_column1 from uniqdata_hive limit 10;
> [Expected Result] :- Insert should be success for timestamp data in Hive 
> Carbon partition table
>  
> [Actual Issue] : - Concurrent Insert Overwrite with static partition on Index 
> server fails
> [!https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/1/17/c71035/a40a6d6be1434b1db8e8c1c6f5a2e97b/image.png!|https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2021/1/17/c71035/a40a6d6be1434b1db8e8c1c6f5a2e97b/image.png]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4115) Return and show segment ID after successful load and insert, including patitioned table and normal table .

2021-03-02 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-4115.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

> Return and show segment ID after successful load and insert, including 
> patitioned table and normal table .
> --
>
> Key: CARBONDATA-4115
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4115
> Project: CarbonData
>  Issue Type: Improvement
>  Components: spark-integration
>Reporter: lihongzhao
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> Return and show segment ID after successful load and insert, including 
> patitioned table and normal table .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4122) Support Writing Flink Stage data into Hdfs file system

2021-02-10 Thread Ajantha Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat resolved CARBONDATA-4122.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

> Support Writing Flink Stage data into Hdfs file system
> --
>
> Key: CARBONDATA-4122
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4122
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Indhumathi Muthu Murugesh
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >