[jira] [Created] (CARBONDATA-3542) Support Map data type reading through Hive
dhatchayani created CARBONDATA-3542: --- Summary: Support Map data type reading through Hive Key: CARBONDATA-3542 URL: https://issues.apache.org/jira/browse/CARBONDATA-3542 Project: CarbonData Issue Type: Improvement Reporter: dhatchayani -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3511) Query time improvement by reducing the number of NameNode calls while having carbonindex files in the store
dhatchayani created CARBONDATA-3511: --- Summary: Query time improvement by reducing the number of NameNode calls while having carbonindex files in the store Key: CARBONDATA-3511 URL: https://issues.apache.org/jira/browse/CARBONDATA-3511 Project: CarbonData Issue Type: Improvement Reporter: dhatchayani -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (CARBONDATA-3451) Select aggregation query with filter fails on hive table with decimal type using CarbonHiveSerDe in Spark 2.1
[ https://issues.apache.org/jira/browse/CARBONDATA-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880140#comment-16880140 ] dhatchayani commented on CARBONDATA-3451: - Please check this again. It is already fixed in [CARBONDATA-3441|https://issues.apache.org/jira/browse/CARBONDATA-3441] > Select aggregation query with filter fails on hive table with decimal type > using CarbonHiveSerDe in Spark 2.1 > - > > Key: CARBONDATA-3451 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3451 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.6.0 > Environment: Spark 2.1 >Reporter: Chetan Bhat >Priority: Minor > > Test steps : > In Spark 2.1 beeline user creates a carbon table and loads data. > create table Test_Boundary (c1_int int,c2_Bigint Bigint,c3_Decimal > Decimal(38,38),c4_double double,c5_string string,c6_Timestamp > Timestamp,c7_Datatype_Desc string) STORED BY 'org.apache.carbondata.format' > TBLPROPERTIES('inverted_index'='c1_int,c2_Bigint,c5_string,c6_Timestamp','sort_columns'='c1_int,c2_Bigint,c5_string,c6_Timestamp'); > LOAD DATA INPATH 'hdfs://hacluster/chetan/Test_Data1.csv' INTO table > Test_Boundary > OPTIONS('DELIMITER'=',','QUOTECHAR'='','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'=''); > From hive beeline user creates a hive table from the already created carbon > table using CarbonHiveSerDe. > CREATE TABLE IF NOT EXISTS Test_Boundary1 (c1_int int,c2_Bigint > Bigint,c3_Decimal Decimal(38,38),c4_double double,c5_string > string,c6_Timestamp Timestamp,c7_Datatype_Desc string) ROW FORMAT SERDE > 'org.apache.carbondata.hive.CarbonHiveSerDe' WITH SERDEPROPERTIES > ('mapreduce.input.carboninputformat.databaseName'='default','mapreduce.input.carboninputformat.tableName'='Test_Boundary') > STORED AS INPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonInputFormat' > OUTPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonOutputFormat' LOCATION > 'hdfs://hacluster//user/hive/warehouse/carbon.store/default/test_boundary'; > User executes below select aggregation query on the hive table. > select min(c3_Decimal),max(c3_Decimal),sum(c3_Decimal),avg(c3_Decimal) , > count(c3_Decimal), variance(c3_Decimal) from test_boundary1 where > exp(c1_int)=0.0 or exp(c1_int)=1.0; > select min(c3_Decimal),max(c3_Decimal),sum(c3_Decimal),avg(c3_Decimal) , > count(c3_Decimal), variance(c3_Decimal) from test_boundary1 where > log(c1_int,1)=0.0 or log(c1_int,1) IS NULL; > select min(c3_Decimal),max(c3_Decimal),sum(c3_Decimal),avg(c3_Decimal) , > count(c3_Decimal), variance(c3_Decimal) from test_boundary1 where > pmod(c1_int,1)=0 or pmod(c1_int,1)IS NULL; > > Actual Result : Select aggregation query with filter fails on hive table with > decimal type using CarbonHiveSerDe in Spark 2.1 > Expected Result : Select aggregation query with filter should be success on > hive table with decimal type using CarbonHiveSerDe in Spark 2.1 > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3455) Job Group ID is not displayed in the IndexServer
dhatchayani created CARBONDATA-3455: --- Summary: Job Group ID is not displayed in the IndexServer Key: CARBONDATA-3455 URL: https://issues.apache.org/jira/browse/CARBONDATA-3455 Project: CarbonData Issue Type: Bug Reporter: dhatchayani Job Group ID is not displayed in the IndexServer -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3443) Update hive guide with Read from hive
dhatchayani created CARBONDATA-3443: --- Summary: Update hive guide with Read from hive Key: CARBONDATA-3443 URL: https://issues.apache.org/jira/browse/CARBONDATA-3443 Project: CarbonData Issue Type: Improvement Reporter: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3441) Aggregate queries are failing on Reading from Hive
dhatchayani created CARBONDATA-3441: --- Summary: Aggregate queries are failing on Reading from Hive Key: CARBONDATA-3441 URL: https://issues.apache.org/jira/browse/CARBONDATA-3441 Project: CarbonData Issue Type: Bug Reporter: dhatchayani Aggregate queries are failing on Reading from Hive -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3415) Merge index is not working for partition table. Merge index for partition table is taking significantly longer time than normal table.
dhatchayani created CARBONDATA-3415: --- Summary: Merge index is not working for partition table. Merge index for partition table is taking significantly longer time than normal table. Key: CARBONDATA-3415 URL: https://issues.apache.org/jira/browse/CARBONDATA-3415 Project: CarbonData Issue Type: Bug Reporter: dhatchayani Issues: (1) Merge index is not working on partition table (2) Time taken for merge index is significantly more than the normal carbon table -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3406) Support Binary, Boolean,Varchar, Complex data types read and Dictionary columns read
dhatchayani created CARBONDATA-3406: --- Summary: Support Binary, Boolean,Varchar, Complex data types read and Dictionary columns read Key: CARBONDATA-3406 URL: https://issues.apache.org/jira/browse/CARBONDATA-3406 Project: CarbonData Issue Type: Bug Reporter: dhatchayani (1) Support all data types read (2) Support dictionary columns read -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3393) Merge Index Job Failure should not trigger the merge index job again. Exception propagation should be decided by the User.
[ https://issues.apache.org/jira/browse/CARBONDATA-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-3393: Summary: Merge Index Job Failure should not trigger the merge index job again. Exception propagation should be decided by the User. (was: Merge Index Job Failure should not trigger the merge index job again. Exception should be propagated to the caller.) > Merge Index Job Failure should not trigger the merge index job again. > Exception propagation should be decided by the User. > -- > > Key: CARBONDATA-3393 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3393 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Priority: Minor > Time Spent: 2h 20m > Remaining Estimate: 0h > > If the merge index job is failed, LOAD is also failing. Load should not > consider the merge index job status to decide the LOAD status. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3393) Merge Index Job Failure should not trigger the merge index job again. Exception should be propagated to the caller.
[ https://issues.apache.org/jira/browse/CARBONDATA-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-3393: Summary: Merge Index Job Failure should not trigger the merge index job again. Exception should be propagated to the caller. (was: Merge Index Job Failure should not fail the LOAD. Load status should not consider the merge index job status.) > Merge Index Job Failure should not trigger the merge index job again. > Exception should be propagated to the caller. > --- > > Key: CARBONDATA-3393 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3393 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > If the merge index job is failed, LOAD is also failing. Load should not > consider the merge index job status to decide the LOAD status. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3393) Merge Index Job Failure should not fail the LOAD. Load status should not consider the merge index job status.
dhatchayani created CARBONDATA-3393: --- Summary: Merge Index Job Failure should not fail the LOAD. Load status should not consider the merge index job status. Key: CARBONDATA-3393 URL: https://issues.apache.org/jira/browse/CARBONDATA-3393 Project: CarbonData Issue Type: Bug Reporter: dhatchayani If the merge index job is failed, LOAD is also failing. Load should not consider the merge index job status to decide the LOAD status. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3386) Concurrent Merge index and query is failing
dhatchayani created CARBONDATA-3386: --- Summary: Concurrent Merge index and query is failing Key: CARBONDATA-3386 URL: https://issues.apache.org/jira/browse/CARBONDATA-3386 Project: CarbonData Issue Type: Bug Reporter: dhatchayani Concurrent merge index and query is failing. Load is triggered on a table, at the end of the load Merge index will be triggered. But this is triggered after the table status is updated as SUCCESS/PARTIAL SUCCESS for that segments. So for the concurrent query, this segment is available for query. Once the merge index is done, it deletes the index files, which are still referred by the query, this leads to the query failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3364) Support Read from Hive. Queries are giving empty results from hive.
[ https://issues.apache.org/jira/browse/CARBONDATA-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-3364: Summary: Support Read from Hive. Queries are giving empty results from hive. (was: Support Read from Hive. Queries on carbon table are giving empty results from hive.) > Support Read from Hive. Queries are giving empty results from hive. > --- > > Key: CARBONDATA-3364 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3364 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3364) Support Read from Hive. Queries on carbon table are giving empty results from hive.
dhatchayani created CARBONDATA-3364: --- Summary: Support Read from Hive. Queries on carbon table are giving empty results from hive. Key: CARBONDATA-3364 URL: https://issues.apache.org/jira/browse/CARBONDATA-3364 Project: CarbonData Issue Type: Bug Reporter: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3293) Prune datamaps improvement for count(*)
[ https://issues.apache.org/jira/browse/CARBONDATA-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-3293: Summary: Prune datamaps improvement for count(*) (was: Prune datamaps improvement) > Prune datamaps improvement for count(*) > --- > > Key: CARBONDATA-3293 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3293 > Project: CarbonData > Issue Type: Improvement >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Major > Time Spent: 6h 20m > Remaining Estimate: 0h > > +*Problem:*+ > (1) Currently for count ( *) , the prune is same as select * query. Blocklet > and ExtendedBlocklet are formed from the DataMapRow and that is of no need > and it is a time consuming process. > (2) Pruning in select * query consumes time in convertToSafeRow() - > converting the DataMapRow to safe as in an unsafe row to get the position of > data, we need to traverse through the whole row to reach a position. > (3) In case of filter queries, even if the blocklet is valid or invalid, we > are converting the DataMapRow to safeRow. This conversion is time consuming > increasing the number of blocklets. > > +*Solution:*+ > (1) We have the blocklet row count in the DataMapRow itself, so it is just > enough to read the count. With this count ( *) query performance can be > improved. > (2) Maintain the data length also to the DataMapRow, so that traversing the > whole row can be avoided. With the length we can directly hit the data > position. > (3) Read only the MinMax from the DataMapRow, decide whether scan is required > on that blocklet, if required only then it can be converted to safeRow, if > needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3313) count(*) is not invalidating the invalid segments cache
dhatchayani created CARBONDATA-3313: --- Summary: count(*) is not invalidating the invalid segments cache Key: CARBONDATA-3313 URL: https://issues.apache.org/jira/browse/CARBONDATA-3313 Project: CarbonData Issue Type: Bug Reporter: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3293) Prune datamaps improvement
[ https://issues.apache.org/jira/browse/CARBONDATA-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-3293: Summary: Prune datamaps improvement (was: Prune for count(*) improvement) > Prune datamaps improvement > -- > > Key: CARBONDATA-3293 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3293 > Project: CarbonData > Issue Type: Improvement >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Major > > Problem: > (1) Currently for count(*), the prune is same as select * query. Blocklet > and ExtendedBlocklet are formed from the DataMapRow and that is of no need. > > Solution: > We have the blocklet row count in the DataMapRow itself, so it is just enough > to read the count. With this count(*) query performance can be improved. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3293) Prune datamaps improvement
[ https://issues.apache.org/jira/browse/CARBONDATA-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-3293: Description: +*Problem:*+ (1) Currently for count ( *) , the prune is same as select * query. Blocklet and ExtendedBlocklet are formed from the DataMapRow and that is of no need and it is a time consuming process. (2) Pruning in select * query consumes time in convertToSafeRow() - converting the DataMapRow to safe as in an unsafe row to get the position of data, we need to traverse through the whole row to reach a position. (3) In case of filter queries, even if the blocklet is valid or invalid, we are converting the DataMapRow to safeRow. This conversion is time consuming increasing the number of blocklets. +*Solution:*+ (1) We have the blocklet row count in the DataMapRow itself, so it is just enough to read the count. With this count ( *) query performance can be improved. (2) Maintain the data length also to the DataMapRow, so that traversing the whole row can be avoided. With the length we can directly hit the data position. (3) Read only the MinMax from the DataMapRow, decide whether scan is required on that blocklet, if required only then it can be converted to safeRow, if needed. was: Problem: (1) Currently for count ( *) , the prune is same as select * query. Blocklet and ExtendedBlocklet are formed from the DataMapRow and that is of no need and it is a time consuming process. (2) Pruning in select * query consumes time in convertToSafeRow() - converting the DataMapRow to safe as in an unsafe row to get the position of data, we need to traverse through the whole row to reach a position. (3) In case of filter queries, even if the blocklet is valid or invalid, we are converting the DataMapRow to safeRow. This conversion is time consuming increasing the number of blocklets. Solution: (1) We have the blocklet row count in the DataMapRow itself, so it is just enough to read the count. With this count ( *) query performance can be improved. (2) Maintain the data length also to the DataMapRow, so that traversing the whole row can be avoided. With the length we can directly hit the data position. (3) Read only the MinMax from the DataMapRow, decide whether scan is required on that blocklet, if required only then it can be converted to safeRow, if needed. > Prune datamaps improvement > -- > > Key: CARBONDATA-3293 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3293 > Project: CarbonData > Issue Type: Improvement >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Major > > +*Problem:*+ > (1) Currently for count ( *) , the prune is same as select * query. Blocklet > and ExtendedBlocklet are formed from the DataMapRow and that is of no need > and it is a time consuming process. > (2) Pruning in select * query consumes time in convertToSafeRow() - > converting the DataMapRow to safe as in an unsafe row to get the position of > data, we need to traverse through the whole row to reach a position. > (3) In case of filter queries, even if the blocklet is valid or invalid, we > are converting the DataMapRow to safeRow. This conversion is time consuming > increasing the number of blocklets. > > +*Solution:*+ > (1) We have the blocklet row count in the DataMapRow itself, so it is just > enough to read the count. With this count ( *) query performance can be > improved. > (2) Maintain the data length also to the DataMapRow, so that traversing the > whole row can be avoided. With the length we can directly hit the data > position. > (3) Read only the MinMax from the DataMapRow, decide whether scan is required > on that blocklet, if required only then it can be converted to safeRow, if > needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3293) Prune datamaps improvement
[ https://issues.apache.org/jira/browse/CARBONDATA-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-3293: Description: Problem: (1) Currently for count ( *) , the prune is same as select * query. Blocklet and ExtendedBlocklet are formed from the DataMapRow and that is of no need and it is a time consuming process. (2) Pruning in select * query consumes time in convertToSafeRow() - converting the DataMapRow to safe as in an unsafe row to get the position of data, we need to traverse through the whole row to reach a position. (3) In case of filter queries, even if the blocklet is valid or invalid, we are converting the DataMapRow to safeRow. This conversion is time consuming increasing the number of blocklets. Solution: (1) We have the blocklet row count in the DataMapRow itself, so it is just enough to read the count. With this count ( *) query performance can be improved. (2) Maintain the data length also to the DataMapRow, so that traversing the whole row can be avoided. With the length we can directly hit the data position. (3) Read only the MinMax from the DataMapRow, decide whether scan is required on that blocklet, if required only then it can be converted to safeRow, if needed. was: Problem: (1) Currently for count ( *) , the prune is same as select * query. Blocklet and ExtendedBlocklet are formed from the DataMapRow and that is of no need. Solution: We have the blocklet row count in the DataMapRow itself, so it is just enough to read the count. With this count(*) query performance can be improved. > Prune datamaps improvement > -- > > Key: CARBONDATA-3293 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3293 > Project: CarbonData > Issue Type: Improvement >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Major > > Problem: > (1) Currently for count ( *) , the prune is same as select * query. Blocklet > and ExtendedBlocklet are formed from the DataMapRow and that is of no need > and it is a time consuming process. > (2) Pruning in select * query consumes time in convertToSafeRow() - > converting the DataMapRow to safe as in an unsafe row to get the position of > data, we need to traverse through the whole row to reach a position. > (3) In case of filter queries, even if the blocklet is valid or invalid, we > are converting the DataMapRow to safeRow. This conversion is time consuming > increasing the number of blocklets. > > Solution: > (1) We have the blocklet row count in the DataMapRow itself, so it is just > enough to read the count. With this count ( *) query performance can be > improved. > (2) Maintain the data length also to the DataMapRow, so that traversing the > whole row can be avoided. With the length we can directly hit the data > position. > (3) Read only the MinMax from the DataMapRow, decide whether scan is required > on that blocklet, if required only then it can be converted to safeRow, if > needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3293) Prune datamaps improvement
[ https://issues.apache.org/jira/browse/CARBONDATA-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-3293: Description: Problem: (1) Currently for count * , the prune is same as select * query. Blocklet and ExtendedBlocklet are formed from the DataMapRow and that is of no need. Solution: We have the blocklet row count in the DataMapRow itself, so it is just enough to read the count. With this count(*) query performance can be improved. was: Problem: (1) Currently for count(*), the prune is same as select * query. Blocklet and ExtendedBlocklet are formed from the DataMapRow and that is of no need. Solution: We have the blocklet row count in the DataMapRow itself, so it is just enough to read the count. With this count(*) query performance can be improved. > Prune datamaps improvement > -- > > Key: CARBONDATA-3293 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3293 > Project: CarbonData > Issue Type: Improvement >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Major > > Problem: > (1) Currently for count * , the prune is same as select * query. Blocklet > and ExtendedBlocklet are formed from the DataMapRow and that is of no need. > > Solution: > We have the blocklet row count in the DataMapRow itself, so it is just enough > to read the count. With this count(*) query performance can be improved. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3293) Prune datamaps improvement
[ https://issues.apache.org/jira/browse/CARBONDATA-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-3293: Description: Problem: (1) Currently for count ( *) , the prune is same as select * query. Blocklet and ExtendedBlocklet are formed from the DataMapRow and that is of no need. Solution: We have the blocklet row count in the DataMapRow itself, so it is just enough to read the count. With this count(*) query performance can be improved. was: Problem: (1) Currently for count * , the prune is same as select * query. Blocklet and ExtendedBlocklet are formed from the DataMapRow and that is of no need. Solution: We have the blocklet row count in the DataMapRow itself, so it is just enough to read the count. With this count(*) query performance can be improved. > Prune datamaps improvement > -- > > Key: CARBONDATA-3293 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3293 > Project: CarbonData > Issue Type: Improvement >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Major > > Problem: > (1) Currently for count ( *) , the prune is same as select * query. Blocklet > and ExtendedBlocklet are formed from the DataMapRow and that is of no need. > > Solution: > We have the blocklet row count in the DataMapRow itself, so it is just enough > to read the count. With this count(*) query performance can be improved. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3293) Prune for count(*) improvement
[ https://issues.apache.org/jira/browse/CARBONDATA-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-3293: Description: Problem: (1) Currently for count(*), the prune is same as select * query. Blocklet and ExtendedBlocklet are formed from the DataMapRow and that is of no need. Solution: We have the blocklet row count in the DataMapRow itself, so it is just enough to read the count. With this count(*) query performance can be improved. > Prune for count(*) improvement > -- > > Key: CARBONDATA-3293 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3293 > Project: CarbonData > Issue Type: Improvement >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Major > > Problem: > (1) Currently for count(*), the prune is same as select * query. Blocklet > and ExtendedBlocklet are formed from the DataMapRow and that is of no need. > > Solution: > We have the blocklet row count in the DataMapRow itself, so it is just enough > to read the count. With this count(*) query performance can be improved. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3293) Prune for count(*) improvement
dhatchayani created CARBONDATA-3293: --- Summary: Prune for count(*) improvement Key: CARBONDATA-3293 URL: https://issues.apache.org/jira/browse/CARBONDATA-3293 Project: CarbonData Issue Type: Improvement Reporter: dhatchayani Assignee: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3241) Refactor the requested scan columns and the projection columns
dhatchayani created CARBONDATA-3241: --- Summary: Refactor the requested scan columns and the projection columns Key: CARBONDATA-3241 URL: https://issues.apache.org/jira/browse/CARBONDATA-3241 Project: CarbonData Issue Type: Improvement Reporter: dhatchayani Assignee: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2755) Compaction of Complex DataType (STRUCT AND ARRAY)
[ https://issues.apache.org/jira/browse/CARBONDATA-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-2755: Summary: Compaction of Complex DataType (STRUCT AND ARRAY) (was: Compaction of Complex DataType) > Compaction of Complex DataType (STRUCT AND ARRAY) > - > > Key: CARBONDATA-2755 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2755 > Project: CarbonData > Issue Type: Sub-task >Reporter: sounak chakraborty >Assignee: dhatchayani >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > Complex Type Enhancements - Compaction of Complex DataType -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-2755) Compaction of Complex DataType
[ https://issues.apache.org/jira/browse/CARBONDATA-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714748#comment-16714748 ] dhatchayani commented on CARBONDATA-2755: - https://issues.apache.org/jira/browse/CARBONDATA-3160 Jira to extend compaction support with MAP type > Compaction of Complex DataType > -- > > Key: CARBONDATA-2755 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2755 > Project: CarbonData > Issue Type: Sub-task >Reporter: sounak chakraborty >Assignee: dhatchayani >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > Complex Type Enhancements - Compaction of Complex DataType -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3160) Compaction support with MAP data type
[ https://issues.apache.org/jira/browse/CARBONDATA-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-3160: Description: Support compaction with MAP type > Compaction support with MAP data type > - > > Key: CARBONDATA-3160 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3160 > Project: CarbonData > Issue Type: Sub-task >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > > Support compaction with MAP type -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3160) Compaction support with MAP data type
dhatchayani created CARBONDATA-3160: --- Summary: Compaction support with MAP data type Key: CARBONDATA-3160 URL: https://issues.apache.org/jira/browse/CARBONDATA-3160 Project: CarbonData Issue Type: Sub-task Reporter: dhatchayani Assignee: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-2605) Complex DataType Enhancements
[ https://issues.apache.org/jira/browse/CARBONDATA-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani reassigned CARBONDATA-2605: --- Assignee: dhatchayani > Complex DataType Enhancements > - > > Key: CARBONDATA-2605 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2605 > Project: CarbonData > Issue Type: Improvement >Reporter: sounak chakraborty >Assignee: dhatchayani >Priority: Major > Attachments: Complex Data Type Enhancements.pdf > > > Umbrella Jira to implement enhancements in Complex Data Type for Carbon. > * Projection push down for struct data type. > * Provide adaptive encoding and decoding for all data type. > * Support JSON data loading directly into Carbon table. > > Please access the Design Document through this link. > > [https://docs.google.com/document/d/12EZwUlLs53Vro7pMeLnFd0lCjeKOakKY-60e3cryJb4/edit#|https://docs.google.com/document/d/12EZwUlLs53Vro7pMeLnFd0lCjeKOakKY-60e3cryJb4/edit] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-2755) Compaction of Complex DataType
[ https://issues.apache.org/jira/browse/CARBONDATA-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714745#comment-16714745 ] dhatchayani commented on CARBONDATA-2755: - This Jira is to support compaction with STRUCT and ARRAY type. For MAP type, a new Jira will be raised. > Compaction of Complex DataType > -- > > Key: CARBONDATA-2755 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2755 > Project: CarbonData > Issue Type: Sub-task >Reporter: sounak chakraborty >Assignee: dhatchayani >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > Complex Type Enhancements - Compaction of Complex DataType -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-2755) Compaction of Complex DataType
[ https://issues.apache.org/jira/browse/CARBONDATA-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani reassigned CARBONDATA-2755: --- Assignee: dhatchayani (was: sounak chakraborty) > Compaction of Complex DataType > -- > > Key: CARBONDATA-2755 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2755 > Project: CarbonData > Issue Type: Sub-task >Reporter: sounak chakraborty >Assignee: dhatchayani >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > Complex Type Enhancements - Compaction of Complex DataType -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3145) Avoid duplicate decoding for complex column pages while querying
[ https://issues.apache.org/jira/browse/CARBONDATA-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-3145: Summary: Avoid duplicate decoding for complex column pages while querying (was: Read improvement for complex column pages while querying) > Avoid duplicate decoding for complex column pages while querying > > > Key: CARBONDATA-3145 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3145 > Project: CarbonData > Issue Type: Sub-task >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3145) Read improvement for complex column pages while querying
dhatchayani created CARBONDATA-3145: --- Summary: Read improvement for complex column pages while querying Key: CARBONDATA-3145 URL: https://issues.apache.org/jira/browse/CARBONDATA-3145 Project: CarbonData Issue Type: Sub-task Reporter: dhatchayani Assignee: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3131) Update the requested columns to the Scan
dhatchayani created CARBONDATA-3131: --- Summary: Update the requested columns to the Scan Key: CARBONDATA-3131 URL: https://issues.apache.org/jira/browse/CARBONDATA-3131 Project: CarbonData Issue Type: Sub-task Reporter: dhatchayani Assignee: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3117) Rearrange the projection list in the Scan
dhatchayani created CARBONDATA-3117: --- Summary: Rearrange the projection list in the Scan Key: CARBONDATA-3117 URL: https://issues.apache.org/jira/browse/CARBONDATA-3117 Project: CarbonData Issue Type: Bug Reporter: dhatchayani Assignee: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3096) Wrong records size on the input metrics
[ https://issues.apache.org/jira/browse/CARBONDATA-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-3096: Description: (1) Scanned record result size is taking from the default batch size. It should be taken from the records scanned. h1. +*Steps to reproduce:*+ spark.sql("DROP TABLE IF EXISTS person") spark.sql("create table person (id int, name string) stored by 'carbondata'") spark.sql("insert into person select 1,'a'") spark.sql("select * from person").show(false) !3096.PNG! (2) The intermediate page used to sort in adaptive encoding should be freed. was: (1) Scanned record result size is taking from the default batch size. It should be taken from the records scanned. Steps to reproduce: spark.sql("DROP TABLE IF EXISTS person") spark.sql("create table person (id int, name string) stored by 'carbondata'") spark.sql("insert into person select 1,'a'") spark.sql("select * from person").show(false) !3096.PNG! (2) The intermediate page used to sort in adaptive encoding should be freed. > Wrong records size on the input metrics > --- > > Key: CARBONDATA-3096 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3096 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > Attachments: 3096.PNG > > Time Spent: 9h 20m > Remaining Estimate: 0h > > (1) Scanned record result size is taking from the default batch size. It > should be taken from the records scanned. > h1. +*Steps to reproduce:*+ > spark.sql("DROP TABLE IF EXISTS person") > spark.sql("create table person (id int, name string) stored by 'carbondata'") > spark.sql("insert into person select 1,'a'") > spark.sql("select * from person").show(false) > !3096.PNG! > > (2) The intermediate page used to sort in adaptive encoding should be freed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3096) Wrong records size on the input metrics
[ https://issues.apache.org/jira/browse/CARBONDATA-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-3096: Summary: Wrong records size on the input metrics (was: Wrong records size on the input metrics & Free the intermediate page used while adaptive encoding) > Wrong records size on the input metrics > --- > > Key: CARBONDATA-3096 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3096 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > Attachments: 3096.PNG > > Time Spent: 9h 20m > Remaining Estimate: 0h > > (1) Scanned record result size is taking from the default batch size. It > should be taken from the records scanned. > Steps to reproduce: > spark.sql("DROP TABLE IF EXISTS person") > spark.sql("create table person (id int, name string) stored by 'carbondata'") > spark.sql("insert into person select 1,'a'") > spark.sql("select * from person").show(false) > !3096.PNG! > > (2) The intermediate page used to sort in adaptive encoding should be freed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3096) Wrong records size on the input metrics & Free the intermediate page used while adaptive encoding
[ https://issues.apache.org/jira/browse/CARBONDATA-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-3096: Description: (1) Scanned record result size is taking from the default batch size. It should be taken from the records scanned. Steps to reproduce: spark.sql("DROP TABLE IF EXISTS person") spark.sql("create table person (id int, name string) stored by 'carbondata'") spark.sql("insert into person select 1,'a'") spark.sql("select * from person").show(false) !3096.PNG! (2) The intermediate page used to sort in adaptive encoding should be freed. was: (1) Scanned record result size is taking from the default batch size. It should be taken from the records scanned. Steps to reproduce: spark.sql("DROP TABLE IF EXISTS person") spark.sql("create table person (id int, name string) stored by 'carbondata'") spark.sql("insert into person select 1,'a'") spark.sql("select * from person").show(false) +--+---+---+--++++--+---+---+---+---+-+---+---+---+++---+ |query_id |task_id|start_time |total_time|load_blocks_time|load_dictionary_time|carbon_scan_time|carbon_IO_time|scan_blocks_num|total_blocklets|valid_blocklets|total_pages|scanned_pages|valid_pages|+*result_size*+|key_column_filling_time|measure_filling_time|page_uncompress_time|result_preparation_time| +--+---+---+--++++--+---+---+---+---+-+---+---+---+++---+ |29127036821854| 0|2018-11-16 20:22:56.573| 1430ms| 100ms| 0ms| 13| 102| 1| 1| 1| 1| 0| 1| +*64000*+| 0| 0| 927| 0| +--+---+---+--++++--+---+---+---+---+-+---+---+---+++---+ (2) The intermediate page used to sort in adaptive encoding should be freed. > Wrong records size on the input metrics & Free the intermediate page used > while adaptive encoding > - > > Key: CARBONDATA-3096 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3096 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > Attachments: 3096.PNG > > Time Spent: 9h 20m > Remaining Estimate: 0h > > (1) Scanned record result size is taking from the default batch size. It > should be taken from the records scanned. > Steps to reproduce: > spark.sql("DROP TABLE IF EXISTS person") > spark.sql("create table person (id int, name string) stored by 'carbondata'") > spark.sql("insert into person select 1,'a'") > spark.sql("select * from person").show(false) > !3096.PNG! > > (2) The intermediate page used to sort in adaptive encoding should be freed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3096) Wrong records size on the input metrics & Free the intermediate page used while adaptive encoding
[ https://issues.apache.org/jira/browse/CARBONDATA-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-3096: Attachment: 3096.PNG > Wrong records size on the input metrics & Free the intermediate page used > while adaptive encoding > - > > Key: CARBONDATA-3096 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3096 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > Attachments: 3096.PNG > > Time Spent: 9h 20m > Remaining Estimate: 0h > > (1) Scanned record result size is taking from the default batch size. It > should be taken from the records scanned. > Steps to reproduce: > spark.sql("DROP TABLE IF EXISTS person") > spark.sql("create table person (id int, name string) stored by 'carbondata'") > spark.sql("insert into person select 1,'a'") > spark.sql("select * from person").show(false) > > +--+---+---+--++++--+---+---+---+---+-+---+---+---+++---+ > |query_id |task_id|start_time > |total_time|load_blocks_time|load_dictionary_time|carbon_scan_time|carbon_IO_time|scan_blocks_num|total_blocklets|valid_blocklets|total_pages|scanned_pages|valid_pages|+*result_size*+|key_column_filling_time|measure_filling_time|page_uncompress_time|result_preparation_time| > +--+---+---+--++++--+---+---+---+---+-+---+---+---+++---+ > |29127036821854| 0|2018-11-16 20:22:56.573| 1430ms| 100ms| 0ms| 13| 102| 1| > 1| 1| 1| 0| 1| +*64000*+| 0| 0| 927| 0| > +--+---+---+--++++--+---+---+---+---+-+---+---+---+++---+ > (2) The intermediate page used to sort in adaptive encoding should be freed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3096) Wrong records size on the input metrics & Free the intermediate page used while adaptive encoding
[ https://issues.apache.org/jira/browse/CARBONDATA-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-3096: Description: (1) Scanned record result size is taking from the default batch size. It should be taken from the records scanned. Steps to reproduce: spark.sql("DROP TABLE IF EXISTS person") spark.sql("create table person (id int, name string) stored by 'carbondata'") spark.sql("insert into person select 1,'a'") spark.sql("select * from person").show(false) +--+---+---+--++++--+---+---+---+---+-+---+---+---+++---+ |query_id |task_id|start_time |total_time|load_blocks_time|load_dictionary_time|carbon_scan_time|carbon_IO_time|scan_blocks_num|total_blocklets|valid_blocklets|total_pages|scanned_pages|valid_pages|+*result_size*+|key_column_filling_time|measure_filling_time|page_uncompress_time|result_preparation_time| +--+---+---+--++++--+---+---+---+---+-+---+---+---+++---+ |29127036821854| 0|2018-11-16 20:22:56.573| 1430ms| 100ms| 0ms| 13| 102| 1| 1| 1| 1| 0| 1| +*64000*+| 0| 0| 927| 0| +--+---+---+--++++--+---+---+---+---+-+---+---+---+++---+ (2) The intermediate page used to sort in adaptive encoding should be freed. was: (1) Scanned record result size is taking from the default batch size. It should be taken from the records scanned. (2) The intermediate page used to sort in adaptive encoding should be freed. > Wrong records size on the input metrics & Free the intermediate page used > while adaptive encoding > - > > Key: CARBONDATA-3096 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3096 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > Time Spent: 9h 20m > Remaining Estimate: 0h > > (1) Scanned record result size is taking from the default batch size. It > should be taken from the records scanned. > Steps to reproduce: > spark.sql("DROP TABLE IF EXISTS person") > spark.sql("create table person (id int, name string) stored by 'carbondata'") > spark.sql("insert into person select 1,'a'") > spark.sql("select * from person").show(false) > > +--+---+---+--++++--+---+---+---+---+-+---+---+---+++---+ > |query_id |task_id|start_time > |total_time|load_blocks_time|load_dictionary_time|carbon_scan_time|carbon_IO_time|scan_blocks_num|total_blocklets|valid_blocklets|total_pages|scanned_pages|valid_pages|+*result_size*+|key_column_filling_time|measure_filling_time|page_uncompress_time|result_preparation_time| > +--+---+---+--++++--+---+---+---+---+-+---+---+---+++---+ > |29127036821854| 0|2018-11-16 20:22:56.573| 1430ms| 100ms| 0ms| 13| 102| 1| > 1| 1| 1| 0| 1| +*64000*+| 0| 0| 927| 0| > +--+---+---+--++++--+---+---+---+---+-+---+---+---+++---+ > (2) The intermediate page used to sort in adaptive encoding should be freed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3096) Wrong records size on the input metrics & Free the intermediate page used while adaptive encoding
dhatchayani created CARBONDATA-3096: --- Summary: Wrong records size on the input metrics & Free the intermediate page used while adaptive encoding Key: CARBONDATA-3096 URL: https://issues.apache.org/jira/browse/CARBONDATA-3096 Project: CarbonData Issue Type: Bug Reporter: dhatchayani Assignee: dhatchayani (1) Scanned record result size is taking from the default batch size. It should be taken from the records scanned. (2) The intermediate page used to sort in adaptive encoding should be freed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3023) Alter add column issue with SORT_COLUMNS
[ https://issues.apache.org/jira/browse/CARBONDATA-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-3023: Summary: Alter add column issue with SORT_COLUMNS (was: Alter add column issue with reading a row) > Alter add column issue with SORT_COLUMNS > > > Key: CARBONDATA-3023 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3023 > Project: CarbonData > Issue Type: Improvement >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3023) Alter add column issue with reading a row
dhatchayani created CARBONDATA-3023: --- Summary: Alter add column issue with reading a row Key: CARBONDATA-3023 URL: https://issues.apache.org/jira/browse/CARBONDATA-3023 Project: CarbonData Issue Type: Improvement Reporter: dhatchayani Assignee: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3022) Refactor ColumnPageWrapper
dhatchayani created CARBONDATA-3022: --- Summary: Refactor ColumnPageWrapper Key: CARBONDATA-3022 URL: https://issues.apache.org/jira/browse/CARBONDATA-3022 Project: CarbonData Issue Type: Improvement Reporter: dhatchayani Assignee: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2998) Refresh column schema for old store(before V3) for SORT_COLUMNS option
[ https://issues.apache.org/jira/browse/CARBONDATA-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-2998: Summary: Refresh column schema for old store(before V3) for SORT_COLUMNS option (was: Refresh column schema for old store for SORT_COLUMNS option) > Refresh column schema for old store(before V3) for SORT_COLUMNS option > -- > > Key: CARBONDATA-2998 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2998 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2998) Refresh column schema for old store for SORT_COLUMNS option
dhatchayani created CARBONDATA-2998: --- Summary: Refresh column schema for old store for SORT_COLUMNS option Key: CARBONDATA-2998 URL: https://issues.apache.org/jira/browse/CARBONDATA-2998 Project: CarbonData Issue Type: Bug Reporter: dhatchayani Assignee: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2975) DefaultValue choosing and removeNullValues on range filters is incorrect
[ https://issues.apache.org/jira/browse/CARBONDATA-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-2975: Summary: DefaultValue choosing and removeNullValues on range filters is incorrect (was: DefaultValue choosing and removeNullValues on range filters is inorrect) > DefaultValue choosing and removeNullValues on range filters is incorrect > > > Key: CARBONDATA-2975 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2975 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2975) DefaultValue choosing and removeNullValues on range filters is inorrect
dhatchayani created CARBONDATA-2975: --- Summary: DefaultValue choosing and removeNullValues on range filters is inorrect Key: CARBONDATA-2975 URL: https://issues.apache.org/jira/browse/CARBONDATA-2975 Project: CarbonData Issue Type: Bug Reporter: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2947) Adaptive encoding support for timestamp no dictionary and Refactor ColumnPageWrapper
[ https://issues.apache.org/jira/browse/CARBONDATA-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-2947: Summary: Adaptive encoding support for timestamp no dictionary and Refactor ColumnPageWrapper (was: Adaptive encoding support for timestamp no dictionary) > Adaptive encoding support for timestamp no dictionary and Refactor > ColumnPageWrapper > > > Key: CARBONDATA-2947 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2947 > Project: CarbonData > Issue Type: Improvement >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > Time Spent: 6h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2946) Unify conversion while writing to Bloom
[ https://issues.apache.org/jira/browse/CARBONDATA-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-2946: Summary: Unify conversion while writing to Bloom (was: Unify conversion while writing to Bloom and Refactor ColumnPageWrapper) > Unify conversion while writing to Bloom > --- > > Key: CARBONDATA-2946 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2946 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2946) Unify conversion while writing to Bloom and Refactor ColumnPageWrapper
[ https://issues.apache.org/jira/browse/CARBONDATA-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-2946: Summary: Unify conversion while writing to Bloom and Refactor ColumnPageWrapper (was: Bloom filter backward compatibility with adaptive encoding and Refactor) > Unify conversion while writing to Bloom and Refactor ColumnPageWrapper > -- > > Key: CARBONDATA-2946 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2946 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2947) Adaptive encoding support for timestamp no dictionary
dhatchayani created CARBONDATA-2947: --- Summary: Adaptive encoding support for timestamp no dictionary Key: CARBONDATA-2947 URL: https://issues.apache.org/jira/browse/CARBONDATA-2947 Project: CarbonData Issue Type: Improvement Reporter: dhatchayani Assignee: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2946) Bloom filter backward compatibility with adaptive encoding and Refactor
dhatchayani created CARBONDATA-2946: --- Summary: Bloom filter backward compatibility with adaptive encoding and Refactor Key: CARBONDATA-2946 URL: https://issues.apache.org/jira/browse/CARBONDATA-2946 Project: CarbonData Issue Type: Bug Reporter: dhatchayani Assignee: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2896) Adaptive encoding for primitive data types
[ https://issues.apache.org/jira/browse/CARBONDATA-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-2896: Description: Currently Encoding and Decoding is present only for Dictionary, Measure Columns, but for no dictionary Primitive types encoding is *absent.* *Encoding is a technique used to reduce the storage size and after all these encoding, result will be compressed with snappy compression to further reduce the storage size.* *With this feature, we support encoding on the no dictionary primitive data types also.* > Adaptive encoding for primitive data types > -- > > Key: CARBONDATA-2896 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2896 > Project: CarbonData > Issue Type: New Feature >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Major > > Currently Encoding and Decoding is present only for Dictionary, Measure > Columns, but for no dictionary Primitive types encoding is *absent.* > *Encoding is a technique used to reduce the storage size and after all these > encoding, result will be compressed with snappy compression to further reduce > the storage size.* > *With this feature, we support encoding on the no dictionary primitive data > types also.* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2896) Adaptive encoding for primitive data types
[ https://issues.apache.org/jira/browse/CARBONDATA-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-2896: Description: Currently Encoding and Decoding is present only for Dictionary, Measure Columns, but for no dictionary Primitive types encoding is *absent.* Encoding is a technique used to reduce the storage size and after all these encoding, result will be compressed with snappy compression to further reduce the storage size. With this feature, we support encoding on the no dictionary primitive data types also. was: Currently Encoding and Decoding is present only for Dictionary, Measure Columns, but for no dictionary Primitive types encoding is *absent.* *Encoding is a technique used to reduce the storage size and after all these encoding, result will be compressed with snappy compression to further reduce the storage size.* *With this feature, we support encoding on the no dictionary primitive data types also.* > Adaptive encoding for primitive data types > -- > > Key: CARBONDATA-2896 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2896 > Project: CarbonData > Issue Type: New Feature >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Major > > Currently Encoding and Decoding is present only for Dictionary, Measure > Columns, but for no dictionary Primitive types encoding is *absent.* > Encoding is a technique used to reduce the storage size and after all these > encoding, result will be compressed with snappy compression to further reduce > the storage size. > With this feature, we support encoding on the no dictionary primitive data > types also. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2896) Adaptive encoding for primitive data types
dhatchayani created CARBONDATA-2896: --- Summary: Adaptive encoding for primitive data types Key: CARBONDATA-2896 URL: https://issues.apache.org/jira/browse/CARBONDATA-2896 Project: CarbonData Issue Type: New Feature Reporter: dhatchayani Assignee: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2829) Fix creating merge index on older V1 V2 store
dhatchayani created CARBONDATA-2829: --- Summary: Fix creating merge index on older V1 V2 store Key: CARBONDATA-2829 URL: https://issues.apache.org/jira/browse/CARBONDATA-2829 Project: CarbonData Issue Type: Improvement Reporter: dhatchayani Assignee: dhatchayani Block creating merge index on older V1 V2 version -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2812) Implement freeMemory for complex pages
[ https://issues.apache.org/jira/browse/CARBONDATA-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-2812: Summary: Implement freeMemory for complex pages (was: Implement free memory for complex pages ) > Implement freeMemory for complex pages > --- > > Key: CARBONDATA-2812 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2812 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2812) Implement free memory for complex pages
dhatchayani created CARBONDATA-2812: --- Summary: Implement free memory for complex pages Key: CARBONDATA-2812 URL: https://issues.apache.org/jira/browse/CARBONDATA-2812 Project: CarbonData Issue Type: Bug Reporter: dhatchayani Assignee: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2808) Insert into select is crashing as both are sharing the same task context
dhatchayani created CARBONDATA-2808: --- Summary: Insert into select is crashing as both are sharing the same task context Key: CARBONDATA-2808 URL: https://issues.apache.org/jira/browse/CARBONDATA-2808 Project: CarbonData Issue Type: Bug Reporter: dhatchayani Assignee: dhatchayani Insert into select is failing as both are running as the same task and both are sharing the same taskcontext and resources are cleared once any one of the RDD's task is completed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2714) Support merge index files for the segment
[ https://issues.apache.org/jira/browse/CARBONDATA-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-2714: Description: We already have discussed the merge index advantages in the community. Please find the link below. [http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Merging-carbonindex-files-for-each-segments-and-across-segments-td24441.html] But the feature is not completed and have some gaps like this feature is not supported for some of the features like pre-aggregate table, streaming table. In this JIRA, Merge index feature will be completed by supporting all the existing impacted features. > Support merge index files for the segment > - > > Key: CARBONDATA-2714 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2714 > Project: CarbonData > Issue Type: Sub-task >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Major > > We already have discussed the merge index advantages in the community. > Please find the link below. > [http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Merging-carbonindex-files-for-each-segments-and-across-segments-td24441.html] > But the feature is not completed and have some gaps like this feature is not > supported for some of the features like pre-aggregate table, streaming table. > In this JIRA, Merge index feature will be completed by supporting all the > existing impacted features. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2714) Support merge index files for the segment
dhatchayani created CARBONDATA-2714: --- Summary: Support merge index files for the segment Key: CARBONDATA-2714 URL: https://issues.apache.org/jira/browse/CARBONDATA-2714 Project: CarbonData Issue Type: Sub-task Reporter: dhatchayani Assignee: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2704) Index file size in describe formatted command is not updated correctly with the segment file
[ https://issues.apache.org/jira/browse/CARBONDATA-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-2704: Summary: Index file size in describe formatted command is not updated correctly with the segment file (was: Index file size in describe formatted command is not updated correctly according to the segment file) > Index file size in describe formatted command is not updated correctly with > the segment file > > > Key: CARBONDATA-2704 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2704 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2704) Index file size in describe formatted command is not updated correctly according to the segment file
[ https://issues.apache.org/jira/browse/CARBONDATA-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-2704: Summary: Index file size in describe formatted command is not updated correctly according to the segment file (was: Index file size in describe formatted command is not updated correctly) > Index file size in describe formatted command is not updated correctly > according to the segment file > > > Key: CARBONDATA-2704 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2704 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2704) Index file size in describe formatted command is not updated correctly
dhatchayani created CARBONDATA-2704: --- Summary: Index file size in describe formatted command is not updated correctly Key: CARBONDATA-2704 URL: https://issues.apache.org/jira/browse/CARBONDATA-2704 Project: CarbonData Issue Type: Bug Reporter: dhatchayani Assignee: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2571) Calculating the carbonindex and carbondata file size of a table is wrong
dhatchayani created CARBONDATA-2571: --- Summary: Calculating the carbonindex and carbondata file size of a table is wrong Key: CARBONDATA-2571 URL: https://issues.apache.org/jira/browse/CARBONDATA-2571 Project: CarbonData Issue Type: Bug Reporter: dhatchayani Assignee: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2482) Pass uuid while writing segment file if possible
dhatchayani created CARBONDATA-2482: --- Summary: Pass uuid while writing segment file if possible Key: CARBONDATA-2482 URL: https://issues.apache.org/jira/browse/CARBONDATA-2482 Project: CarbonData Issue Type: Improvement Reporter: dhatchayani Assignee: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2470) Refactor AlterTableCompactionPostStatusUpdateEvent usage in compaction flow
dhatchayani created CARBONDATA-2470: --- Summary: Refactor AlterTableCompactionPostStatusUpdateEvent usage in compaction flow Key: CARBONDATA-2470 URL: https://issues.apache.org/jira/browse/CARBONDATA-2470 Project: CarbonData Issue Type: Improvement Reporter: dhatchayani Assignee: dhatchayani AlterTableCompactionPostStatusUpdateEvent in compaction flow is controlled only by the preaggregate listener. If the CommitPreAggregateListener sets the commitComplete property to true, this event will not be fired for the next iteration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2467) Null is printed in the SDK writer logs for operations logged
dhatchayani created CARBONDATA-2467: --- Summary: Null is printed in the SDK writer logs for operations logged Key: CARBONDATA-2467 URL: https://issues.apache.org/jira/browse/CARBONDATA-2467 Project: CarbonData Issue Type: Improvement Reporter: dhatchayani Assignee: Rahul Kumar Expected Output:Null should not be printed in the SDK writer logs Actual Output:Null is printed in the SDK writer logs for operations logged as shown below. This is confusing for the user. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2458) Remove unnecessary TableProvider interface
dhatchayani created CARBONDATA-2458: --- Summary: Remove unnecessary TableProvider interface Key: CARBONDATA-2458 URL: https://issues.apache.org/jira/browse/CARBONDATA-2458 Project: CarbonData Issue Type: Improvement Reporter: dhatchayani Assignee: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2448) Adding compacted segments to load and alter events
dhatchayani created CARBONDATA-2448: --- Summary: Adding compacted segments to load and alter events Key: CARBONDATA-2448 URL: https://issues.apache.org/jira/browse/CARBONDATA-2448 Project: CarbonData Issue Type: Improvement Reporter: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2362) Changing the Cacheable object from DataMap to Wrapper
dhatchayani created CARBONDATA-2362: --- Summary: Changing the Cacheable object from DataMap to Wrapper Key: CARBONDATA-2362 URL: https://issues.apache.org/jira/browse/CARBONDATA-2362 Project: CarbonData Issue Type: Improvement Reporter: dhatchayani Assignee: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2310) Refactored code to improve Distributable interface
dhatchayani created CARBONDATA-2310: --- Summary: Refactored code to improve Distributable interface Key: CARBONDATA-2310 URL: https://issues.apache.org/jira/browse/CARBONDATA-2310 Project: CarbonData Issue Type: Improvement Reporter: dhatchayani Assignee: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (CARBONDATA-2265) [DFX]-Load]: Load job fails if 1 folder contains 1000 files
[ https://issues.apache.org/jira/browse/CARBONDATA-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani reassigned CARBONDATA-2265: --- Assignee: dhatchayani > [DFX]-Load]: Load job fails if 1 folder contains 1000 files > > > Key: CARBONDATA-2265 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2265 > Project: CarbonData > Issue Type: Bug > Environment: 3 node ant cluster >Reporter: Ajeet Rai >Assignee: dhatchayani >Priority: Major > Labels: DFX > > Load job fails if 1 folder contains 1000 files. > 【Precondition】:Thrift server should be running > 【Test step】: > 1: Create a carbon table > 2: Start a load where 1 folder contains 1000 files > 3: Observe that load fails > > Observe that Out of Memory exception is thrown. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2223) Adding Listener Support for Partition
[ https://issues.apache.org/jira/browse/CARBONDATA-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-2223: Summary: Adding Listener Support for Partition (was: Remove unused listeners) > Adding Listener Support for Partition > - > > Key: CARBONDATA-2223 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2223 > Project: CarbonData > Issue Type: Improvement >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > Time Spent: 6h > Remaining Estimate: 0h > > Remove unused listeners -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2223) Remove unused listeners
dhatchayani created CARBONDATA-2223: --- Summary: Remove unused listeners Key: CARBONDATA-2223 URL: https://issues.apache.org/jira/browse/CARBONDATA-2223 Project: CarbonData Issue Type: Improvement Reporter: dhatchayani Assignee: dhatchayani Remove unused listeners -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2125) like% filter is giving ArrayIndexOutOfBoundException in case of table having more pages
[ https://issues.apache.org/jira/browse/CARBONDATA-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani resolved CARBONDATA-2125. - Resolution: Fixed Fix Version/s: 1.3.0 > like% filter is giving ArrayIndexOutOfBoundException in case of table having > more pages > --- > > Key: CARBONDATA-2125 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2125 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Major > Fix For: 1.3.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.close(AbstractDataBlockIterator.java:247) > at > org.apache.carbondata.core.scan.result.iterator.AbstractDetailQueryResultIterator.close(AbstractDetailQueryResultIterator.java:307) > at > org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.finish(AbstractQueryExecutor.java:590) > at > org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.close(VectorizedCarbonRecordReader.java:162) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1$$anonfun$17.apply(CarbonScanRDD.scala:385) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1$$anonfun$17.apply(CarbonScanRDD.scala:384) > at > org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:128) > at > org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:117) > at > org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:117) > at > org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:130) > at > org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:128) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:128) > at > org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.ExecutionException: > java.lang.ArrayIndexOutOfBoundsException: 1 > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.close(AbstractDataBlockIterator.java:242) > ... 19 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.carbondata.core.scan.filter.executer.RowLevelFilterExecuterImpl.applyFilter(RowLevelFilterExecuterImpl.java:225) > at > org.apache.carbondata.core.scan.scanner.impl.FilterScanner.fillScannedResult(FilterScanner.java:168) > at > org.apache.carbondata.core.scan.scanner.impl.FilterScanner.scanBlocklet(FilterScanner.java:100) > at > org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator$1.call(AbstractDataBlockIterator.java:201) > at > org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator$1.call(AbstractDataBlockIterator.java:188) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ... 3 more -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2131) Alter table adding long datatype is failing but Create table with long type is successful, in Spark 2.1
dhatchayani created CARBONDATA-2131: --- Summary: Alter table adding long datatype is failing but Create table with long type is successful, in Spark 2.1 Key: CARBONDATA-2131 URL: https://issues.apache.org/jira/browse/CARBONDATA-2131 Project: CarbonData Issue Type: Bug Reporter: dhatchayani Assignee: dhatchayani create table test4(a1 int) stored by 'carbondata'; +-+--+ | Result | +-+--+ +-+--+ No rows selected (1.757 seconds) ** *alter table test4 add columns (a6 long);* *Error: java.lang.RuntimeException*: BaseSqlParser == Parse1 == Operation not allowed: alter table add columns(line 1, pos 0) == SQL == alter table test4 add columns (a6 long) ^^^ == Parse2 == [1.35] failure: identifier matching regex (?i)VARCHAR expected alter table test4 add columns (a6 long) ^; CarbonSqlParser [1.35] failure: identifier matching regex (?i)VARCHAR expected alter table test4 add columns (a6 long) ^ (state=,code=0) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2125) like% filter is giving ArrayIndexOutOfBoundException in case of table having more pages
[ https://issues.apache.org/jira/browse/CARBONDATA-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-2125: Description: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.close(AbstractDataBlockIterator.java:247) at org.apache.carbondata.core.scan.result.iterator.AbstractDetailQueryResultIterator.close(AbstractDetailQueryResultIterator.java:307) at org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.finish(AbstractQueryExecutor.java:590) at org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.close(VectorizedCarbonRecordReader.java:162) at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1$$anonfun$17.apply(CarbonScanRDD.scala:385) at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1$$anonfun$17.apply(CarbonScanRDD.scala:384) at org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:128) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:117) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:117) at org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:130) at org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:128) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:128) at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 1 at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.close(AbstractDataBlockIterator.java:242) ... 19 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.carbondata.core.scan.filter.executer.RowLevelFilterExecuterImpl.applyFilter(RowLevelFilterExecuterImpl.java:225) at org.apache.carbondata.core.scan.scanner.impl.FilterScanner.fillScannedResult(FilterScanner.java:168) at org.apache.carbondata.core.scan.scanner.impl.FilterScanner.scanBlocklet(FilterScanner.java:100) at org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator$1.call(AbstractDataBlockIterator.java:201) at org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator$1.call(AbstractDataBlockIterator.java:188) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more > like% filter is giving ArrayIndexOutOfBoundException in case of table having > more pages > --- > > Key: CARBONDATA-2125 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2125 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Major > > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.close(AbstractDataBlockIterator.java:247) > at > org.apache.carbondata.core.scan.result.iterator.AbstractDetailQueryResultIterator.close(AbstractDetailQueryResultIterator.java:307) > at > org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.finish(AbstractQueryExecutor.java:590) > at > org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.close(VectorizedCarbonRecordReader.java:162) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1$$anonfun$17.apply(CarbonScanRDD.scala:385) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1$$anonfun$17.apply(CarbonScanRDD.scala:384) > at > org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:128) > at > org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:117) > at > org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:117) > at > org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:130) > at >
[jira] [Created] (CARBONDATA-2125) like% filter is giving ArrayIndexOutOfBoundException in case of table having more pages
dhatchayani created CARBONDATA-2125: --- Summary: like% filter is giving ArrayIndexOutOfBoundException in case of table having more pages Key: CARBONDATA-2125 URL: https://issues.apache.org/jira/browse/CARBONDATA-2125 Project: CarbonData Issue Type: Bug Reporter: dhatchayani Assignee: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-1918) Incorrect data is displayed when String is updated using Sentences
[ https://issues.apache.org/jira/browse/CARBONDATA-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-1918: Description: update t_carbn01 set (active_status)= (sentences('Hello there! How are you?')); +-+--+ | Result | +-+--+ +-+--+ No rows selected (2.784 seconds) select active_status from t_carbn01; +-+--+ | active_status | +-+--+ *| Hello\:there\\$How\:are\:you\\ |* *| Hello\:there\\$How\:are\:you\\ |* *| Hello\:there\\$How\:are\:you\\ |* *| Hello\:there\\$How\:are\:you\\ |* *| Hello\:there\\$How\:are\:you\\ |* *| Hello\:there\\$How\:are\:you\\ |* *| Hello\:there\\$How\:are\:you\\ |* *| Hello\:there\\$How\:are\:you\\ |* *| Hello\:there\\$How\:are\:you\\ |* *| Hello\:there\\$How\:are\:you\\ |* +-+–+ The issue for sentences function also occurs when the below update is performed. update t_carbn01 set (active_status)= (split('ab', 'a')); > Incorrect data is displayed when String is updated using Sentences > -- > > Key: CARBONDATA-1918 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1918 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > Time Spent: 3h 20m > Remaining Estimate: 0h > > update t_carbn01 set (active_status)= (sentences('Hello there! How are > you?')); > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (2.784 seconds) > select active_status from t_carbn01; > +-+--+ > | active_status | > +-+--+ > *| Hello\:there\\$How\:are\:you\\ |* > *| Hello\:there\\$How\:are\:you\\ |* > *| Hello\:there\\$How\:are\:you\\ |* > *| Hello\:there\\$How\:are\:you\\ |* > *| Hello\:there\\$How\:are\:you\\ |* > *| Hello\:there\\$How\:are\:you\\ |* > *| Hello\:there\\$How\:are\:you\\ |* > *| Hello\:there\\$How\:are\:you\\ |* > *| Hello\:there\\$How\:are\:you\\ |* > *| Hello\:there\\$How\:are\:you\\ |* > +-+–+ > > The issue for sentences function also occurs when the below update is > performed. > update t_carbn01 set (active_status)= (split('ab', 'a')); -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (CARBONDATA-1917) While loading, check for stale dictionary files
[ https://issues.apache.org/jira/browse/CARBONDATA-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani closed CARBONDATA-1917. --- Resolution: Invalid > While loading, check for stale dictionary files > --- > > Key: CARBONDATA-1917 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1917 > Project: CarbonData > Issue Type: Improvement >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > Time Spent: 6h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2106) Update product document with page level reader property
dhatchayani created CARBONDATA-2106: --- Summary: Update product document with page level reader property Key: CARBONDATA-2106 URL: https://issues.apache.org/jira/browse/CARBONDATA-2106 Project: CarbonData Issue Type: Task Reporter: dhatchayani Assignee: Gururaj Shetty -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2064) Add compaction listener
dhatchayani created CARBONDATA-2064: --- Summary: Add compaction listener Key: CARBONDATA-2064 URL: https://issues.apache.org/jira/browse/CARBONDATA-2064 Project: CarbonData Issue Type: Improvement Reporter: dhatchayani Assignee: dhatchayani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2061) Check for only valid IN_PROGRESS segments
[ https://issues.apache.org/jira/browse/CARBONDATA-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-2061: Description: While checking for IN_PROGRESS segments of a table during other operations, we should check only for valid IN_PROGRESS segments. Some segments may be invalid like cancelled and may still in IN_PROGRESS state,those segments should be considered as stale segments. (was: While checking for IN_PROGRESS segments of a table during other operation, we should check only for valid IN_PROGRESS segments. Some segments may be invalid and still in IN_PROGRESS state, should be considered as stale segments.) > Check for only valid IN_PROGRESS segments > - > > Key: CARBONDATA-2061 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2061 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Major > > While checking for IN_PROGRESS segments of a table during other operations, > we should check only for valid IN_PROGRESS segments. Some segments may be > invalid like cancelled and may still in IN_PROGRESS state,those segments > should be considered as stale segments. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2061) Check for only valid IN_PROGRESS segments
dhatchayani created CARBONDATA-2061: --- Summary: Check for only valid IN_PROGRESS segments Key: CARBONDATA-2061 URL: https://issues.apache.org/jira/browse/CARBONDATA-2061 Project: CarbonData Issue Type: Bug Reporter: dhatchayani Assignee: dhatchayani While checking for IN_PROGRESS segments of a table during other operation, we should check only for valid IN_PROGRESS segments. Some segments may be invalid and still in IN_PROGRESS state, should be considered as stale segments. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2015) Restricted maximum length of bytes per column
[ https://issues.apache.org/jira/browse/CARBONDATA-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-2015: Description: Validation for number of bytes for a column is added. We have limited the number of characters per column to 32000. For example, a single unicode character takes 3 bytes. So in this case, if my column has 30,000 unicode characters, then 32000 * 3 exceeds the short range. So, load will fail. > Restricted maximum length of bytes per column > - > > Key: CARBONDATA-2015 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2015 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Validation for number of bytes for a column is added. > We have limited the number of characters per column to 32000. > For example, a single unicode character takes 3 bytes. So in this case, if my > column has 30,000 unicode characters, then 32000 * 3 exceeds the short range. > So, load will fail. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-2015) Restricted maximum length of bytes per column
dhatchayani created CARBONDATA-2015: --- Summary: Restricted maximum length of bytes per column Key: CARBONDATA-2015 URL: https://issues.apache.org/jira/browse/CARBONDATA-2015 Project: CarbonData Issue Type: Bug Reporter: dhatchayani Assignee: dhatchayani Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1975) Wrong input metrics displayed for carbon
[ https://issues.apache.org/jira/browse/CARBONDATA-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-1975: Attachment: Corrected_Data.JPG > Wrong input metrics displayed for carbon > > > Key: CARBONDATA-1975 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1975 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > Fix For: 1.3.0 > > Attachments: Corrected_Data.JPG, Wrong_Data.JPG, beeline.JPG > > Time Spent: 1h > Remaining Estimate: 0h > > Input metrics is updated twice. Record count is updated twice and it is > wrongly displayed in Spark UI -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1975) Wrong input metrics displayed for carbon
[ https://issues.apache.org/jira/browse/CARBONDATA-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-1975: Attachment: Wrong_Data.JPG > Wrong input metrics displayed for carbon > > > Key: CARBONDATA-1975 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1975 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > Fix For: 1.3.0 > > Attachments: Wrong_Data.JPG, beeline.JPG > > Time Spent: 1h > Remaining Estimate: 0h > > Input metrics is updated twice. Record count is updated twice and it is > wrongly displayed in Spark UI -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1975) Wrong input metrics displayed for carbon
[ https://issues.apache.org/jira/browse/CARBONDATA-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-1975: Attachment: beeline.JPG Description: Input metrics is updated twice. Record count is updated twice and it is wrongly displayed in Spark UI was:Input metrics is updated twice. Record count is updated twice and it is wrongly displayed in Spark UI > Wrong input metrics displayed for carbon > > > Key: CARBONDATA-1975 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1975 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > Attachments: beeline.JPG > > Time Spent: 40m > Remaining Estimate: 0h > > Input metrics is updated twice. Record count is updated twice and it is > wrongly displayed in Spark UI -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1975) Wrong input metrics displayed for carbon
[ https://issues.apache.org/jira/browse/CARBONDATA-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-1975: Description: Input metrics is updated twice. Record count is updated twice and it is wrongly displayed in Spark UI (was: Input metrics is updated twice) > Wrong input metrics displayed for carbon > > > Key: CARBONDATA-1975 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1975 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > > Input metrics is updated twice. Record count is updated twice and it is > wrongly displayed in Spark UI -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1975) Wrong input metrics displayed for carbon
[ https://issues.apache.org/jira/browse/CARBONDATA-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-1975: Attachment: (was: metrics2.JPG) > Wrong input metrics displayed for carbon > > > Key: CARBONDATA-1975 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1975 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > > Input metrics is updated twice -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1975) Wrong input metrics displayed for carbon
[ https://issues.apache.org/jira/browse/CARBONDATA-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-1975: Description: Input metrics is updated twice > Wrong input metrics displayed for carbon > > > Key: CARBONDATA-1975 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1975 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani >Priority: Minor > Attachments: metrics2.JPG > > > Input metrics is updated twice -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1975) Wrong input metrics displayed for carbon
dhatchayani created CARBONDATA-1975: --- Summary: Wrong input metrics displayed for carbon Key: CARBONDATA-1975 URL: https://issues.apache.org/jira/browse/CARBONDATA-1975 Project: CarbonData Issue Type: Bug Reporter: dhatchayani Assignee: dhatchayani Priority: Minor Attachments: metrics2.JPG -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1939) Added show segments validation test case
dhatchayani created CARBONDATA-1939: --- Summary: Added show segments validation test case Key: CARBONDATA-1939 URL: https://issues.apache.org/jira/browse/CARBONDATA-1939 Project: CarbonData Issue Type: Improvement Reporter: dhatchayani Assignee: dhatchayani Priority: Minor (1) Modified headers of show segments (2) Modified SDV test cases for validating headers and result -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-1824) Carbon 1.3.0 - Spark 2.2-Residual segment files left over when load failure happens
[ https://issues.apache.org/jira/browse/CARBONDATA-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16301084#comment-16301084 ] dhatchayani commented on CARBONDATA-1824: - please resolve this issue as this is already resolved by CARBONDATA-1759 > Carbon 1.3.0 - Spark 2.2-Residual segment files left over when load failure > happens > --- > > Key: CARBONDATA-1824 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1824 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S >Assignee: dhatchayani >Priority: Minor > Labels: DFX > Fix For: 1.3.0 > > > Steps: > Beeline: > 1. Create a table with batch sort as sort type, keep block size small > 2. Run Load/Insert/Compaction the table > 3. Bring down thrift server when carbon data is being written to the segment > 4. Do show segments on the table > *+Expected:+* It should not show the residual segments > *+Actual:+* The segment intended for load is shown as marked for delete and > it does not get deleted with clean file. No impact on the table as such. > *+Query:+* > create table if not exists lineitem1(L_SHIPDATE string,L_SHIPMODE > string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE > string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY string,L_LINENUMBER > int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX > double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT string) STORED BY > 'org.apache.carbondata.format' TBLPROPERTIES > ('table_blocksize'='1','sort_scope'='BATCH_SORT','batch_sort_size_inmb'='5000'); > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.1" into table > lineitem > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > 0: jdbc:hive2://10.18.98.34:23040> select count(*) from t_carbn0161; > +---+--+ > | count(1) | > +---+--+ > | 0 | > +---+--+ > 1 row selected (13.011 seconds) > 0: jdbc:hive2://10.18.98.34:23040> show segments for table lineitem1; > +++--+--++--+--+ > | SegmentSequenceId | Status | Load Start Time | > Load End Time | Merged To | File Format | > +++--+--++--+--+ > | 1 | Marked for Delete | 2017-11-28 19:14:46.265 | > 2017-11-28 19:15:28.396 | NA | COLUMNAR_V3 | > | 0 | Marked for Delete | 2017-11-28 19:12:58.269 | > 2017-11-28 19:13:37.26 | NA | COLUMNAR_V3 | > +++--+--++--+--+ > 0: jdbc:hive2://10.18.98.34:23040> clean files for table t_carbn0161; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (7.473 seconds) > 0: jdbc:hive2://10.18.98.34:23040> show segments for table lineitem1; > +++--+--++--+--+ > | SegmentSequenceId | Status | Load Start Time | > Load End Time | Merged To | File Format | > +++--+--++--+--+ > | 1 | Marked for Delete | 2017-11-28 19:14:46.265 | > 2017-11-28 19:15:28.396 | NA | COLUMNAR_V3 | > | 0 | Marked for Delete | 2017-11-28 19:12:58.269 | > 2017-11-28 19:13:37.26 | NA | COLUMNAR_V3 | > +++--+--++--+--+ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1824) Carbon 1.3.0 - Spark 2.2-Residual segment files left over when load failure happens
[ https://issues.apache.org/jira/browse/CARBONDATA-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani reassigned CARBONDATA-1824: --- Assignee: dhatchayani (was: kumar vishal) > Carbon 1.3.0 - Spark 2.2-Residual segment files left over when load failure > happens > --- > > Key: CARBONDATA-1824 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1824 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.3.0 > Environment: Test - 3 node ant cluster >Reporter: Ramakrishna S >Assignee: dhatchayani >Priority: Minor > Labels: DFX > Fix For: 1.3.0 > > > Steps: > Beeline: > 1. Create a table with batch sort as sort type, keep block size small > 2. Run Load/Insert/Compaction the table > 3. Bring down thrift server when carbon data is being written to the segment > 4. Do show segments on the table > *+Expected:+* It should not show the residual segments > *+Actual:+* The segment intended for load is shown as marked for delete and > it does not get deleted with clean file. No impact on the table as such. > *+Query:+* > create table if not exists lineitem1(L_SHIPDATE string,L_SHIPMODE > string,L_SHIPINSTRUCT string,L_RETURNFLAG string,L_RECEIPTDATE > string,L_ORDERKEY string,L_PARTKEY string,L_SUPPKEY string,L_LINENUMBER > int,L_QUANTITY double,L_EXTENDEDPRICE double,L_DISCOUNT double,L_TAX > double,L_LINESTATUS string,L_COMMITDATE string,L_COMMENT string) STORED BY > 'org.apache.carbondata.format' TBLPROPERTIES > ('table_blocksize'='1','sort_scope'='BATCH_SORT','batch_sort_size_inmb'='5000'); > load data inpath "hdfs://hacluster/user/test/lineitem.tbl.1" into table > lineitem > options('DELIMITER'='|','FILEHEADER'='L_ORDERKEY,L_PARTKEY,L_SUPPKEY,L_LINENUMBER,L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT,L_TAX,L_RETURNFLAG,L_LINESTATUS,L_SHIPDATE,L_COMMITDATE,L_RECEIPTDATE,L_SHIPINSTRUCT,L_SHIPMODE,L_COMMENT'); > 0: jdbc:hive2://10.18.98.34:23040> select count(*) from t_carbn0161; > +---+--+ > | count(1) | > +---+--+ > | 0 | > +---+--+ > 1 row selected (13.011 seconds) > 0: jdbc:hive2://10.18.98.34:23040> show segments for table lineitem1; > +++--+--++--+--+ > | SegmentSequenceId | Status | Load Start Time | > Load End Time | Merged To | File Format | > +++--+--++--+--+ > | 1 | Marked for Delete | 2017-11-28 19:14:46.265 | > 2017-11-28 19:15:28.396 | NA | COLUMNAR_V3 | > | 0 | Marked for Delete | 2017-11-28 19:12:58.269 | > 2017-11-28 19:13:37.26 | NA | COLUMNAR_V3 | > +++--+--++--+--+ > 0: jdbc:hive2://10.18.98.34:23040> clean files for table t_carbn0161; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (7.473 seconds) > 0: jdbc:hive2://10.18.98.34:23040> show segments for table lineitem1; > +++--+--++--+--+ > | SegmentSequenceId | Status | Load Start Time | > Load End Time | Merged To | File Format | > +++--+--++--+--+ > | 1 | Marked for Delete | 2017-11-28 19:14:46.265 | > 2017-11-28 19:15:28.396 | NA | COLUMNAR_V3 | > | 0 | Marked for Delete | 2017-11-28 19:12:58.269 | > 2017-11-28 19:13:37.26 | NA | COLUMNAR_V3 | > +++--+--++--+--+ -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1896) Clean files operation improvement
[ https://issues.apache.org/jira/browse/CARBONDATA-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-1896: Description: +*Problem:*+ When bringing up the session, clean operation is handled in a way to mark all the INSERT_OVERWRITE_IN_PROGRESS or INSERT_IN_PROGRESS segments to MARKED_FOR_DELETE in tablestatus file. This clean operation is not considering the other parallel sessions. If any other session's data load is IN_PROGRESS at the time of bringing up one session, then the executing load also will be changed to MARKED_FOR_DELETE irrespective of the actual load status. Handling stale segments cleaning while session bring up also increases the time of bringing up a session. +*Solution:*+ SEGMENT_LOCK should be taken on the new segment while loading. While cleaning segments tablestatus file and SEGMENT_LOCK should be considered. Cleaning stale files while bringing up the session should be removed and this can be either manually done on the needed tables through already existing CLEAN FILES DDL or the next load will automatically clean the same. was: +*Problem:*+ When bringing up the session, clean operation is handled in a way to mark all the INSERT_OVERWRITE_IN_PROGRESS or INSERT_IN_PROGRESS segments to MARKED_FOR_DELETE in tablestatus file. This clean operation is not considering the other parallel sessions. If any other session's data load is IN_PROGRESS at the time of bringing up one session, then the executing load also will be changed to MARKED_FOR_DELETE irrespective of the actual load status. Handling stale segments cleaning while session bring up also increases the time of bringing up a session. +*Solution:*+ SEGMENT_LOCK should be taken on the new segment while loading. While cleaning segments tablestatus file and SEGMENT_LOCK should be considered. Cleaning stale files while bringing up the session should be removed and this can be either manually done on the needed tables through already existing CLEAN FILES DDL or the next load will automatically clean the same. *Impact analysis on the solution will be updated soon.* > Clean files operation improvement > - > > Key: CARBONDATA-1896 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1896 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani > Time Spent: 2.5h > Remaining Estimate: 0h > > +*Problem:*+ > When bringing up the session, clean operation is handled in a way to mark all > the INSERT_OVERWRITE_IN_PROGRESS or INSERT_IN_PROGRESS segments to > MARKED_FOR_DELETE in tablestatus file. This clean operation is not > considering the other parallel sessions. If any other session's data load is > IN_PROGRESS at the time of bringing up one session, then the executing load > also will be changed to MARKED_FOR_DELETE irrespective of the actual load > status. Handling stale segments cleaning while session bring up also > increases the time of bringing up a session. > +*Solution:*+ > SEGMENT_LOCK should be taken on the new segment while loading. > While cleaning segments tablestatus file and SEGMENT_LOCK should be > considered. > Cleaning stale files while bringing up the session should be removed and this > can be either manually done on the needed tables through already existing > CLEAN FILES DDL or the next load will automatically clean the same. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1896) Clean files operation improvement
[ https://issues.apache.org/jira/browse/CARBONDATA-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhatchayani updated CARBONDATA-1896: Description: +*Problem:*+ When bringing up the session, clean operation is handled in a way to mark all the INSERT_OVERWRITE_IN_PROGRESS or INSERT_IN_PROGRESS segments to MARKED_FOR_DELETE in tablestatus file. This clean operation is not considering the other parallel sessions. If any other session's data load is IN_PROGRESS at the time of bringing up one session, then the executing load also will be changed to MARKED_FOR_DELETE irrespective of the actual load status. Handling stale segments cleaning while session bring up also increases the time of bringing up a session. +*Solution:*+ SEGMENT_LOCK should be taken on the new segment while loading. While cleaning segments tablestatus file and SEGMENT_LOCK should be considered. Cleaning stale files while bringing up the session should be removed and this can be either manually done on the needed tables through already existing CLEAN FILES DDL or the next load will automatically clean the same. *Impact analysis on the solution will be updated soon.* was: +*Problem:*+ When bringing up the session, clean operation is handled in a way to mark all the INSERT_OVERWRITE_IN_PROGRESS or INSERT_IN_PROGRESS segments to MARKED_FOR_DELETE in tablestatus file. This clean operation is not considering the other parallel sessions. If any other session's data load is IN_PROGRESS at the time of bringing up one session, then the executing load also will be changed to MARKED_FOR_DELETE irrespective of the actual load status. Handling stale segments cleaning while session bring up also increases the time of bringing up a session. +*Solution:*+ SEGMENT_LOCK should be taken on the new segment while loading. While cleaning segments tablestatus file and SEGMENT_LOCK should be considered. Cleaning stale files while bringing up the session should be removed and this should be manually done on the needed tables through already existing CLEAN FILES DDL. *Impact analysis on the solution will be updated soon.* > Clean files operation improvement > - > > Key: CARBONDATA-1896 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1896 > Project: CarbonData > Issue Type: Bug >Reporter: dhatchayani >Assignee: dhatchayani > Time Spent: 2.5h > Remaining Estimate: 0h > > +*Problem:*+ > When bringing up the session, clean operation is handled in a way to mark all > the INSERT_OVERWRITE_IN_PROGRESS or INSERT_IN_PROGRESS segments to > MARKED_FOR_DELETE in tablestatus file. This clean operation is not > considering the other parallel sessions. If any other session's data load is > IN_PROGRESS at the time of bringing up one session, then the executing load > also will be changed to MARKED_FOR_DELETE irrespective of the actual load > status. Handling stale segments cleaning while session bring up also > increases the time of bringing up a session. > +*Solution:*+ > SEGMENT_LOCK should be taken on the new segment while loading. > While cleaning segments tablestatus file and SEGMENT_LOCK should be > considered. > Cleaning stale files while bringing up the session should be removed and this > can be either manually done on the needed tables through already existing > CLEAN FILES DDL or the next load will automatically clean the same. > *Impact analysis on the solution will be updated soon.* -- This message was sent by Atlassian JIRA (v6.4.14#64029)