[jira] [Resolved] (CARBONDATA-4262) [summer-2021] Huawei's first big data open source project

2022-05-10 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4262.
--
Resolution: Fixed

> [summer-2021] Huawei's first big data open source project
> -
>
> Key: CARBONDATA-4262
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4262
> Project: CarbonData
>  Issue Type: Task
>  Components: docs, examples, test
>Affects Versions: 2.1.0, 2.1.1
>Reporter: CHEN XIN
>Assignee: CHEN XIN
>Priority: Minor
> Fix For: 2.1.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (CARBONDATA-4329) External Table Creation overwrites schema and drop external table deletes the location data

2022-04-01 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4329.
--
Fix Version/s: 2.3.1
   Resolution: Fixed

> External Table Creation overwrites schema and drop external table deletes the 
> location data
> ---
>
> Key: CARBONDATA-4329
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4329
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi
>Priority: Major
> Fix For: 2.3.1
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Issue 1:
> When we create external table on transactional table location, schema file 
> will be present. While creating external table, which is also transactional, 
> the schema file is overwritten
> Issue 2:
> If external table is created on a location, where the source table already 
> exists, on drop external table, it is deleting the table data. Query on the 
> source table fails



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (CARBONDATA-4327) Update documentation related to partition

2022-03-17 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4327.
--
Fix Version/s: 2.3.1
   Resolution: Fixed

> Update documentation related to partition
> -
>
> Key: CARBONDATA-4327
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4327
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Minor
> Fix For: 2.3.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Drop partition with data is not supported and a few of the links are not 
> working in 
> https://github.com/apache/carbondata/blob/master/docs/ddl-of-carbondata.md



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (CARBONDATA-4306) Query Performance issue with Spark 3.1

2022-03-07 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-4306:
-
Fix Version/s: 2.3.1
   (was: 2.3.0)

> Query Performance issue with Spark 3.1
> --
>
> Key: CARBONDATA-4306
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4306
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi
>Priority: Major
> Fix For: 2.3.1
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Some rules are applied many times while running benchmark queries like TPCDS 
> and TPCH



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (CARBONDATA-4306) Query Performance issue with Spark 3.1

2022-03-07 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor closed CARBONDATA-4306.


> Query Performance issue with Spark 3.1
> --
>
> Key: CARBONDATA-4306
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4306
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi
>Priority: Major
> Fix For: 2.3.1
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Some rules are applied many times while running benchmark queries like TPCDS 
> and TPCH



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (CARBONDATA-4318) Partition overwrite performance degrades as number of loads increase

2021-12-29 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4318.
--
Fix Version/s: 2.3.0
   Resolution: Fixed

> Partition overwrite performance degrades as number of loads increase
> 
>
> Key: CARBONDATA-4318
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4318
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Akash R Nilugal
>Assignee: Akash R Nilugal
>Priority: Major
> Fix For: 2.3.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Partition overwrite performance degrades as the number of loads increase



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (CARBONDATA-4319) Fixed clean files not deleteting stale delete delta files after horizontal compaction

2021-12-28 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4319.
--
Fix Version/s: 2.3.0
   Resolution: Fixed

> Fixed clean files not deleteting stale delete delta files after horizontal 
> compaction
> -
>
> Key: CARBONDATA-4319
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4319
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vikram Ahuja
>Priority: Minor
> Fix For: 2.3.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (CARBONDATA-4317) TPCDS perf issues

2021-12-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4317.
--
Fix Version/s: 2.3.0
   Resolution: Fixed

> TPCDS perf issues
> -
>
> Key: CARBONDATA-4317
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4317
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
> Fix For: 2.3.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> h3.  
> The following issues has degraded the TPCDS query performance
>  # If dynamic filters is not present in partitionFilters Set, then that 
> filter is skipped, to pushdown to spark.
>  # In some cases, some nodes like Exchange / Shuffle is not reused, because 
> the CarbonDataSourceSCan plan is not mached
>  # While accessing the metadata on the canonicalized plan throws NPE



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (CARBONDATA-4316) Horizontal compaction fails for partition table

2021-12-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4316.
--
Resolution: Fixed

> Horizontal compaction fails for partition table
> ---
>
> Key: CARBONDATA-4316
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4316
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Akash R Nilugal
>Priority: Major
> Fix For: 2.3.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> when delete operation performed on partition table, the horizontal compaction 
> fails leading to lot of small delete delta files and impact query performance



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (CARBONDATA-4305) Support Carbondata Streamer tool to fetch data incrementally and merge

2021-11-25 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4305.
--
Fix Version/s: 2.3.0
   Resolution: Fixed

> Support Carbondata Streamer tool to fetch data incrementally and merge
> --
>
> Key: CARBONDATA-4305
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4305
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Akash R Nilugal
>Assignee: Akash R Nilugal
>Priority: Major
> Fix For: 2.3.0
>
>  Time Spent: 16h 10m
>  Remaining Estimate: 0h
>
> Support a Spark streaming application that basically fetches new incremental 
> data from sources like kafka and DFS and does deduplication and merge the 
> changes onto the target carbondata table.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (CARBONDATA-4296) Handle schema evolution, enforcement and deduplication

2021-11-15 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4296.
--
Fix Version/s: 2.3.0
   Resolution: Fixed

> Handle schema evolution, enforcement and deduplication
> --
>
> Key: CARBONDATA-4296
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4296
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: data-load
>Reporter: Pratyaksh Sharma
>Priority: Major
> Fix For: 2.3.0
>
>  Time Spent: 18h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (CARBONDATA-4306) Query Performance issue with Spark 3.1

2021-10-23 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4306.
--
Fix Version/s: 2.3.0
   Resolution: Fixed

> Query Performance issue with Spark 3.1
> --
>
> Key: CARBONDATA-4306
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4306
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi
>Priority: Major
> Fix For: 2.3.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Some rules are applied many times while running benchmark queries like TPCDS 
> and TPCH



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4293) Table without External keyword is created as external table in local mode

2021-10-07 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4293.
--
Fix Version/s: 2.3.0
   Resolution: Fixed

> Table without External keyword is created as external table in local mode
> -
>
> Key: CARBONDATA-4293
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4293
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Minor
> Fix For: 2.3.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4203) Compaction in SDK segments added is causing compaction issue after update, delete operations.

2021-10-07 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4203.
--
Fix Version/s: 2.3.0
   Resolution: Fixed

> Compaction in SDK segments added is causing compaction issue after update, 
> delete operations.
> -
>
> Key: CARBONDATA-4203
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4203
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.1.1
> Environment: FI cluster - 3 node
>Reporter: Prasanna Ravichandran
>Priority: Major
> Fix For: 2.3.0
>
> Attachments: primitive- SDK files.rar
>
>
> Compaction in SDK segments added through add segments is causing compaction 
> issue after update, delete operations. This issue is present only when delete 
> and update happens on one of the added segment. This issue is not seen 
> without delete and update on 1 segment.
> Place the attached SDK files in the 
> /sdkfiles/primitive/,/sdkfiles/primitive2/, 
> /sdkfiles/primitive3/,/sdkfiles/primitive4/ and /sdkfiles/primitive5/ folders 
> in HDFS and then execute the below queries.
> Test queries:
> drop table if exists external_primitive;
>  create table external_primitive (id int, name string, rank smallint, salary 
> double, active boolean, dob date, doj timestamp, city string, dept string) 
> stored as carbondata;
>  insert into external_primitive select 
> 1,"Pr",1,10,true,"1992-12-09","1992-10-07 22:00:20.0","chennai","CSE";
>  alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive','format'='carbon');
>  delete from external_primitive where id =2;
>  update external_primitive set (name)=("RAMU") where name="CCC";
> drop table if exists external_primitive;
>  create table external_primitive (id int, name string, rank smallint, salary 
> double, active boolean, dob date, doj timestamp, city string, dept string) 
> stored as carbondata;
>  alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive','format'='carbon');
>  alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');
>  alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon');
>  alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive4','format'='carbon');
>  alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive5','format'='carbon');
> alter table external_primitive compact 'minor';
>  
> !image-2021-06-08-16-54-52-412.png!
> Error traces: 
> Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
> org.apache.spark.sql.AnalysisException: Compaction failed. Please check logs 
> for more info. Exception in compaction Compaction Failure in Merger Rdd.
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:396)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$3(SparkExecuteStatementOperation.scala:281)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:46)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:281)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:268)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1761)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:295)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: org.apache.spark.sql.AnalysisException: Compaction failed. Please 
> check 

[jira] [Resolved] (CARBONDATA-4228) Deleted records are reappearing in the select queries from the Alter added carbon segments after delete,update opertions.

2021-10-07 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4228.
--
Fix Version/s: 2.3.0
   Resolution: Fixed

> Deleted records are reappearing in the select queries from the Alter added 
> carbon segments after delete,update opertions.
> -
>
> Key: CARBONDATA-4228
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4228
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 1.6.1
>Reporter: Prasanna Ravichandran
>Priority: Major
> Fix For: 2.3.0
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Deleted records are not deleting and displaying in the select queries from 
> the Alter added carbon segments after delete, update operations. 
> Test queries:
> drop table uniqdata;
> CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as carbondata;
> load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> table uniqdata 
> options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> --Create a copy of files from the first seg; 
> --hdfs dfs -rm -r -f /uniq1/*;
> --hdfs dfs -mkdir -p /uniq1/ 
> --hdfs dfs -cp 
> /user/hive/warehouse/carbon.store/rps/uniqdata/Fact/Part0/Segment_0/* /uniq1/;
> --hdfs dfs -ls /uniq1/;
> use rps;
> Alter table uniqdata add segment options 
> ('path'='hdfs://hacluster/uniq1/','format'='carbon');
> --update and delete works fine without throwing error but it wont work on the 
> added carbon segments;
> delete from uniqdata where cust_id=9001;
> update uniqdata set (cust_name)=('Rahu') where cust_id=1;
> set carbon.input.segments.rps.uniqdata=1;
> --First segment represent the added segment;
> select cust_name from uniqdata where cust_id=1;--CUST_NAME_01000 - 
> incorrect value should be Rahu;
> select count(*) from uniqdata where cust_id=9001;--returns 1 - incorrect, 
> should be 0 as 9001 cust_id records are deleted through Delete DDL;
> reset;
>  
> Console:
> > Alter table uniqdata add segment options 
> > ('path'='hdfs://hacluster/uniq1/','format'='carbon');
> +-+
> | Result |
> +-+
> +-+
> No rows selected (1.226 seconds)
> >
> > delete from uniqdata where cust_id=9001;
> INFO : Execution ID: 139
> ++
> | Deleted Row Count |
> ++
> | 2 |
> ++
> 1 row selected (5.321 seconds)
> > update uniqdata set (cust_name)=('Rahu') where cust_id=1;
> INFO : Execution ID: 142
> ++
> | Updated Row Count |
> ++
> | 2 |
> ++
> 1 row selected (7.938 seconds)
> >
> >
> > set carbon.input.segments.rps.uniqdata=1;
> +-++
> | key | value |
> +-++
> | carbon.input.segments.rps.uniqdata | 1 |
> +-++
> 1 row selected (0.05 seconds)
> > --First segment represent the added segment;
> > select cust_name from uniqdata where cust_id=1;--CUST_NAME_01000 - 
> > incorrect value should be Rahu;
> INFO : Execution ID: 147
> +--+
> | cust_name |
> +--+
> | CUST_NAME_01000 |
> +--+
> 1 row selected (0.93 seconds)
> > select count(*) from uniqdata where cust_id=9001;--returns 1 - incorrect, 
> > should be 0 as 9001 cust_id records are deleted through Delete DDL;
> INFO : Execution ID: 148
> +---+
> | count(1) |
> +---+
> | 1 |
> +---+
> 1 row selected (1.149 seconds)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4289) Wrong cache value showed after firing concurrent select queries to index server

2021-09-20 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4289.
--
Fix Version/s: 2.3.0
   Resolution: Fixed

> Wrong cache value showed after firing concurrent select queries to index 
> server
> ---
>
> Key: CARBONDATA-4289
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4289
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Vikram Ahuja
>Priority: Minor
> Fix For: 2.3.0
>
>
> Steps to reproduce:
> Start index server
> Fire 8 loads concurrently from different spark-sql to the same index server
> Show metacache show extra segments in the index server cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4288) Index Server loading duplicate cache to other executors in the case of SI table

2021-09-20 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4288.
--
Fix Version/s: 2.3.0
   Resolution: Fixed

> Index Server loading duplicate cache to other executors in the case of SI 
> table
> ---
>
> Key: CARBONDATA-4288
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4288
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Vikram Ahuja
>Priority: Minor
> Fix For: 2.3.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Steps to reproduce:
> Start index server
> Disable prepriming
> Create main table
> create SI table
> Load to main table
> Cache in index server has 1 entry even if prepriming is disabled
> do select * on main table
> Show metacache shows 2/1 cache in the Index server



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4285) complex columns with global sort compaction is failed

2021-09-19 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4285.
--
Fix Version/s: 2.3.0
   Resolution: Fixed

> complex columns with global sort compaction is failed
> -
>
> Key: CARBONDATA-4285
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4285
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Mahesh Raju Somalaraju
>Priority: Major
> Fix For: 2.3.0
>
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> complex columns with global sort compaction is failed.
>  
> Steps to reproduce
> -=-
> 1) create table with global sort
> 2) load the data multiple times
> 3) alter add columns
> 4) insert the data
> 5) repeat 3 and 4 for four times
> 6) execute the compaction.
> test("test the complex columns with global sort compaction") {
>  sql("DROP TABLE IF EXISTS alter_global1")
>  sql("CREATE TABLE alter_global1(intField INT) STORED AS carbondata " +
>  "TBLPROPERTIES('sort_columns'='intField','sort_scope'='global_sort')")
>  sql("insert into alter_global1 values(1)")
>  sql("insert into alter_global1 values(2)")
>  sql("insert into alter_global1 values(3)")
>  sql( "ALTER TABLE alter_global1 ADD COLUMNS(str1 array)")
>  sql("insert into alter_global1 values(4, array(1))")
>  checkAnswer(sql("select * from alter_global1"),
>  Seq(Row(1, null), Row(2, null), Row(3, null), Row(4, make(Array(1)
>  val addedColumns = addedColumnsInSchemaEvolutionEntry("alter_global1")
>  assert(addedColumns.size == 1)
>  sql("alter table alter_global1 compact 'minor'")
>  checkAnswer(sql("select * from alter_global1"),
>  Seq(Row(1, null), Row(2, null), Row(3, null), Row(4, make(Array(1)
>  sql("DROP TABLE IF EXISTS alter_global1")
> }
> test("test the multi-level complex columns with global sort compaction") {
>  sql("DROP TABLE IF EXISTS alter_global2")
>  sql("CREATE TABLE alter_global2(intField INT) STORED AS carbondata " +
>  "TBLPROPERTIES('sort_columns'='intField','sort_scope'='global_sort')")
>  sql("insert into alter_global2 values(1)")
>  // multi-level nested array
>  sql(
>  "ALTER TABLE alter_global2 ADD COLUMNS(arr1 array>, arr2 
> array  "map1:Map>>) ")
>  sql(
>  "insert into alter_global2 values(1, array(array(1,2)), 
> array(named_struct('a1','st'," +
>  "'map1', map('a','b'")
>  // multi-level nested struct
>  sql("ALTER TABLE alter_global2 ADD COLUMNS(struct1 struct array>," +
>  " struct2 struct>>) ")
>  sql("insert into alter_global2 values(1, " +
>  "array(array(1,2)), array(named_struct('a1','st','map1', map('a','b'))), " +
>  "named_struct('s1','hi','arr',array(1,2)), 
> named_struct('num',2.3,'contact',map('ph'," +
>  "array(1,2")
>  // multi-level nested map
>  sql(
>  "ALTER TABLE alter_global2 ADD COLUMNS(map1 map>, map2 
> map  "struct>>)")
>  sql("insert into alter_global2 values(1, " +
>  "array(array(1,2)), array(named_struct('a1','st','map1', map('a','b'))), " +
>  "named_struct('s1','hi','arr',array(1,2)), 
> named_struct('num',2.3,'contact',map('ph'," +
>  "array(1,2))),map('a',array('hi')), 
> map('a',named_struct('d',23,'s',named_struct('im'," +
>  "'sh'")
>  val addedColumns = addedColumnsInSchemaEvolutionEntry("alter_global2")
>  assert(addedColumns.size == 6)
>  sql("alter table alter_global2 compact 'minor'")
>  sql("DROP TABLE IF EXISTS alter_global2")



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4284) Load/insert after alter add column on partition table with complex column fails

2021-09-16 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4284.
--
Fix Version/s: 2.3.0
   Resolution: Fixed

> Load/insert after alter add column on partition table with complex column 
> fails 
> 
>
> Key: CARBONDATA-4284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4284
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Major
> Fix For: 2.3.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Insert after alter add column on partition table with complex column fails 
> with bufferUnderFlowException
> [Steps] :-
> drop table if exists strarmap1; create table strarmap1(id int,str 
> struct>,arr 
> array>) PARTITIONED BY(name string) stored as 
> carbondata 
> tblproperties('local_dictionary_enable'='true','local_dictionary_include'='name,str,arr');
>  load data inpath 'hdfs://hacluster/chetan/strarmap1.csv' into table 
> strarmap1 partition(name='name0') 
> options('fileheader'='id,name,str,arr','COMPLEX_DELIMITER_LEVEL_3'='#','COMPLEX_DELIMITER_LEVEL_2'='$','COMPLEX_DELIMITER_LEVEL_1'='&','BAD_RECORDS_ACTION'='FORCE');
>  select * from strarmap1 limit 1; show partitions strarmap1; ALTER TABLE 
> strarmap1 ADD COLUMNS(map1 Map, map2 Map, map3 
> Map, map4 Map, map5 
> Map,map6 Map,map7 map>, 
> map8 map>>); load data inpath 
> 'hdfs://hacluster/chetan/strarmap1.csv' into table strarmap1 
> partition(name='name0') 
> options('fileheader'='id,name,str,arr,map1,map2,map3,map4,map5,map6,map7,map8','COMPLEX_DELIMITER_LEVEL_3'='#','COMPLEX_DELIMITER_LEVEL_2'='$','COMPLEX_DELIMITER_LEVEL_1'='&','BAD_RECORDS_ACTION'='FORCE');
> [Expected Result] :- load after add map columns on partition table should be 
> success
> [Actual Issue]:- error on load after add map columns on partition table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4271) Support DPP for carbon filters

2021-09-01 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4271.
--
Fix Version/s: 2.3.0
   Resolution: Fixed

> Support DPP for carbon filters
> --
>
> Key: CARBONDATA-4271
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4271
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Indhumathi
>Priority: Major
> Fix For: 2.3.0
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4274) Create partition table error with spark 3.1

2021-08-31 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4274.
--
Fix Version/s: 2.3.0
   Resolution: Fixed

>  Create partition table error with spark 3.1
> 
>
> Key: CARBONDATA-4274
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4274
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Major
> Fix For: 2.3.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> With spark 3.1, we can create a partition table by giving partition columns 
> from schema.
> Like below example:
> {{create table partitionTable(c1 int, c2 int, v1 string, v2 string) stored as 
> carbondata partitioned by (v2,c2)}}
> When the table is created by SparkSession with CarbonExtension, catalog table 
> is created with the specified partitions.
> But in cluster/ with carbon session, when we create partition table with 
> above syntax it is creating normal table with no partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4251) optimize clean index file performance

2021-07-27 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-4251:
-
Fix Version/s: (was: 2.2.1)
   2.2.0

> optimize clean index file performance
> -
>
> Key: CARBONDATA-4251
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4251
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.2.0
>Reporter: Jiayu Shen
>Priority: Minor
> Fix For: 2.2.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> When cleanfile cleans up data, it cleans up all the carbonindex and 
> carbonmergeindex that once existed, even though many carbonindex have been 
> all deleted, which have been merged into carbonergeindex. considering that 
> there are tens of thousands of carbonindex that once existed after the 
> completion of the compaction, the clean file command will take serveral hours.
> Here, we just need to clean up the existing files, carbonmergeindex or 
> carbonindex files



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4204) When the path is empty in Carbon add segments then "String Index out of range" error is thrown.

2021-07-27 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4204.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

> When the path is empty in Carbon add segments then "String Index out of 
> range" error is thrown.
> ---
>
> Key: CARBONDATA-4204
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4204
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.1.1
> Environment: 3 node FI cluster
>Reporter: Prasanna Ravichandran
>Priority: Minor
> Fix For: 2.2.0
>
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>
> Test queries:
> CREATE TABLE uniqdata(cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as carbondata;
> load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> table uniqdata 
> options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> Alter table uniqdata add segment options ('path'='','format'='carbon');
> --
> Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
> java.lang.StringIndexOutOfBoundsException: String index out of range: -1
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:396)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$3(SparkExecuteStatementOperation.scala:281)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:46)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:281)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:268)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1761)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:295)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: -1
>  at java.lang.String.charAt(String.java:658)
>  at 
> org.apache.spark.sql.execution.command.management.CarbonAddLoadCommand.processMetadata(CarbonAddLoadCommand.scala:93)
>  at 
> org.apache.spark.sql.execution.command.MetadataCommand.$anonfun$run$1(package.scala:137)
>  at 
> org.apache.spark.sql.execution.command.Auditable.runWithAudit(package.scala:118)
>  at 
> org.apache.spark.sql.execution.command.Auditable.runWithAudit$(package.scala:114)
>  at 
> org.apache.spark.sql.execution.command.MetadataCommand.runWithAudit(package.scala:134)
>  at 
> org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:137)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:71)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:69)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:80)
>  at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:231)
>  at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3697)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:108)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:170)
>  at 
> 

[jira] [Resolved] (CARBONDATA-4231) On update operation with 3.1v, cloned spark session is used and set properties are lost.

2021-07-27 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4231.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

> On update operation with 3.1v, cloned spark session is used and set 
> properties are lost.
> 
>
> Key: CARBONDATA-4231
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4231
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Major
> Fix For: 2.2.0
>
>
> *Update operation with bad records property fails with 3.1v.* 
> *[Steps to reproduce]:*
> 0: jdbc:hive2://linux-221:22550/> set carbon.options.bad.records.action=force;
> +++
> | key | value |
> +++
> | carbon.options.bad.records.action | force |
> +++
> 1 row selected (0.04 seconds)
> 0: jdbc:hive2://linux-221:22550/> create table t_carbn1(item_type_cd int, 
> sell_price bigint, profit decimal(10,4), item_name string, update_time 
> timestamp) stored a
> +-+
> | Result |
> +-+
> +-+
> No rows selected (2.117 seconds)
> 0: jdbc:hive2://linux-221:22550/> insert into t_carbn1 select 2, 
> 10,23.3,'Apple','2012-11-11 11:11:11';
> INFO : Execution ID: 858
> +-+
> | Segment ID |
> +-+
> | 0 |
> +-+
> 1 row selected (4.278 seconds)
> 0: jdbc:hive2://linux-221:22550/> update t_carbn1 set (item_type_cd) = 
> (item_type_cd/1);
> Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
> java.lang.RuntimeException: Update operation failed. DataLoad failure
> *[Root cause]:*
> On update command, persist is called and with latest 3.1 spark changes, spark 
> returns a cloned SparkSession from cacheManager with all specified 
> configurations disabled. As now its using different sparkSession for 3.1 
> which is not initialized in CarbonEnv. So CarbonEnv.init is called where new 
> CarbonSessionInfo is created with no sessionParams. So, the properties set 
> were not accessible.
> Spark creates cloned spark session based on following properties:
> 1. spark.sql.optimizer.canChangeCachedPlanOutputPartitioning
> 2. spark.sql.sources.bucketing.autoBucketedScan.enabled
> 3.  spark.sql.adaptive.enabled
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4210) Support Alter change column for complex columns and fix other issues for Spark 3.1.1

2021-07-13 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-4210:
-
Fix Version/s: 2.2.0

> Support Alter change column for complex columns and fix other issues for 
> Spark 3.1.1
> 
>
> Key: CARBONDATA-4210
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4210
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Vikram Ahuja
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Support Alter change column for complex columns and fix other issues for 
> Spark 3.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4210) Support Alter change column for complex columns and fix other issues for Spark 3.1.1

2021-07-13 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4210.
--
Resolution: Fixed

> Support Alter change column for complex columns and fix other issues for 
> Spark 3.1.1
> 
>
> Key: CARBONDATA-4210
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4210
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Vikram Ahuja
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Support Alter change column for complex columns and fix other issues for 
> Spark 3.1.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4191) update table for primitive column not working when complex child column name and primitive column name match

2021-06-02 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4191.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

> update table for primitive column not working when complex child column name 
> and primitive column name match
> 
>
> Key: CARBONDATA-4191
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4191
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Mahesh Raju Somalaraju
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
>  
> below steps to reproduce the issue:
> drop table if exists update_complex;
> create table update_complex (a int, b string, struct1 STRUCT c:string>) stored as carbondata;
> insert into update_complex select 1,'c', named_struct('a',4,'b','d');
> update update_complex set (a)=(4);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4186) Insert query is failing when partition column is part of local sort scope.

2021-06-02 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4186.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

> Insert query is failing when partition column is part of local sort scope.
> --
>
> Key: CARBONDATA-4186
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4186
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently when we create table with partition column and put the same column 
> as part of local sort scope then Insert query fails with 
> ArrayIndexOutOfBounds exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4185) Heterogeneous format segments in carbondata documenation

2021-05-20 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4185.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

> Heterogeneous format segments in carbondata documenation
> 
>
> Key: CARBONDATA-4185
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4185
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Mahesh Raju Somalaraju
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Heterogeneous format segments in carbondata documenation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4188) Select query fails for longstring data with small table page size after alter add columns

2021-05-20 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4188.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

> Select query fails for longstring data with small table page size after alter 
> add columns
> -
>
> Key: CARBONDATA-4188
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4188
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Steps to reproduce:
>  # Create table with small page size and longstring data type.
>  # Load large amount of data(more than one page should be created.)
>  # Alter add int column on the same table.
>  # Select query with filter on newly added columns fails with 
> ArrayIndexOutOfBoundException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4162) Leverage Secondary Index till segment level with SI as datamap and SI with plan rewrite

2021-05-10 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4162.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

https://github.com/apache/carbondata/pull/4116

> Leverage Secondary Index till segment level with SI as datamap and SI with 
> plan rewrite
> ---
>
> Key: CARBONDATA-4162
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4162
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Nihal kumar ojha
>Priority: Major
> Fix For: 2.2.0
>
> Attachments: Support SI at segment level.pdf
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> *Background:*
> Secondary index tables are created as indexes and managed as child tables 
> internally by Carbondata. In the existing architecture, if the parent(main) 
> table and SI table don’t
> have the same valid segments then we disable the SI table. And then from the
> next query onwards, we scan and prune only the parent table until we trigger
> the next load or REINDEX command (as these commands will make the
> parent and SI table segments in sync). Because of this, queries take more
> time to give the result when SI is disabled.
> *Proposed Solution:*
> We are planning to leverage SI till the segment level. It means at place
> of disabling the SI table(when parent and child table segments are not in 
> sync)
> we will do pruning on SI tables for all the valid segments(segments with 
> status
> success, marked for update and load partial success) and the rest of the
> segments will be pruned by the parent table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4175) Issue with array_contains after altering schema for array types

2021-05-10 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4175.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

> Issue with array_contains after altering schema for array types
> ---
>
> Key: CARBONDATA-4175
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4175
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Reporter: Akshay
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> NPE on executing filter query after adding array column to the carbon table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4175) Issue with array_contains after altering schema for array types

2021-05-10 Thread Kunal Kapoor (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17342316#comment-17342316
 ] 

Kunal Kapoor commented on CARBONDATA-4175:
--

PR: https://github.com/apache/carbondata/pull/4116

> Issue with array_contains after altering schema for array types
> ---
>
> Key: CARBONDATA-4175
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4175
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Reporter: Akshay
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> NPE on executing filter query after adding array column to the carbon table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4172) Select query having parent and child struct column in projection returns incorrect results

2021-04-26 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4172.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

> Select query having  parent and child struct column in projection returns 
> incorrect results
> ---
>
> Key: CARBONDATA-4172
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4172
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> struct column: col1 struct
> insert: named_struct('a',1,'b',2,'c','a')
> Query : select col1,col1.a from table;
> Result:
> col1 col1.a
> {a:1,b:null,c:null}  1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4158) Make Secondary Index as a coarse grain datamap and use secondary indexes for Presto queries

2021-04-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4158.
--
Fix Version/s: 2.2.0
   Resolution: Fixed

> Make Secondary Index as a coarse grain datamap and use secondary indexes for 
> Presto queries
> ---
>
> Key: CARBONDATA-4158
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4158
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Venugopal Reddy K
>Priority: Minor
> Fix For: 2.2.0
>
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>
> *Background:*
> Secondary Indexes are created as carbon tables and are managed as child 
> tables to the main table. And these indexes are leveraged for query pruning 
> via spark plan modification during optimizer/execution phases of query 
> execution. In order to make use of Secondary Indexes for queries from engines 
> other than spark like presto etc, it is not feasible to modify the engine 
> specific query execution plans as we desire in the current approach. It makes 
> Secondary Indexes not usable for presto query pruning. Thus need arises for 
> an engine agnostic approach to use Secondary Indexes for presto queries.
> *Description:*
> Current Secondary Index pruning is tightly coupled with spark because the 
> query plan modification is specific to the spark engine. It is hard to reuse 
> the solution for presto queries. Need a new solution to use secondary indexes 
> with Presto queries. And it  shouldn’t affect the existing customer using 
> secondary index with spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4156) Segment min max is not written considering all blocks in a segment

2021-03-25 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4156.
--
Fix Version/s: 2.1.1
   Resolution: Fixed

> Segment min max is not written considering all blocks in a segment
> --
>
> Key: CARBONDATA-4156
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4156
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CARBONDATA-4155) CReate table like on table with MV fails

2021-03-23 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor reassigned CARBONDATA-4155:


Assignee: (was: Kunal Kapoor)

> CReate table like on table with MV fails 
> -
>
> Key: CARBONDATA-4155
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4155
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> steps to reproduce:
> {color:#067d17}create table maintable(name string, c_code int, price int) 
> STORED AS carbondata;{color}
> {color:#067d17}create materialized view mv_table as select name, sum(price) 
> from maintable group by name;{color}
> {color:#067d17}create table new_Table like maintable;{color}
> {color:#172b4d}Result:
> {color}
> 2021-03-22 20:40:06 ERROR CarbonCreateTableCommand:176 - 
> org.apache.spark.sql.AnalysisException: == Spark Parser: 
> org.apache.spark.sql.execution.SparkSqlParser ==
> extraneous input 'default' expecting \{')', ','}(line 8, pos 25)
> == SQL ==
> CREATE TABLE default.new_table
> (`name` string,`c_code` int,`price` int)
> USING carbondata
> OPTIONS (
>  indexexists "false",
>  sort_columns "",
>  comment "",
>  relatedmvtablesmap "\{"default":["mv_table"]}",
> -^^^
>  bad_record_path "",
>  local_dictionary_enable "true",
>  indextableexists "false",
>  tableName "new_table",
>  dbName "default",
>  tablePath 
> "/home/root1/carbondata/integration/spark/target/warehouse/new_table",
>  path 
> "file:/home/root1/carbondata/integration/spark/target/warehouse/new_table",
>  isExternal "false",
>  isTransactional "true",
>  isVisible "true"
>  ,carbonSchemaPartsNo '1',carbonSchema0 
> '\{"databaseName":"default","tableUniqueName":"default_new_table","factTable":{"tableId":"4ddbaea5-42b8-4ca2-b0ce-dec0af81d3b6","tableName":"new_table","listOfColumns":[{"dataType":{"id":0,"precedenceOrder":0,"name":"STRING","sizeInBytes":-1},"columnName":"name","columnUniqueId":"2293eee8-41fa-4869-8275-8c16a5dd7222","columnReferenceId":"2293eee8-41fa-4869-8275-8c16a5dd7222","isColumnar":true,"encodingList":[],"isDimensionColumn":true,"scale":-1,"precision":-1,"schemaOrdinal":0,"numberOfChild":0,"columnProperties":{},"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":true},\{"dataType":{"id":5,"precedenceOrder":3,"name":"INT","sizeInBytes":4},"columnName":"c_code","columnUniqueId":"cc3ab016-51e9-4791-8f37-8d697d972b8a","columnReferenceId":"cc3ab016-51e9-4791-8f37-8d697d972b8a","isColumnar":true,"encodingList":[],"isDimensionColumn":false,"scale":-1,"precision":-1,"schemaOrdinal":1,"numberOfChild":0,"columnProperties":{},"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":false},\{"dataType":{"id":5,"precedenceOrder":3,"name":"INT","sizeInBytes":4},"columnName":"price","columnUniqueId":"c67ed6d5-8f10-488f-a990-dfda20739907","columnReferenceId":"c67ed6d5-8f10-488f-a990-dfda20739907","isColumnar":true,"encodingList":[],"isDimensionColumn":false,"scale":-1,"precision":-1,"schemaOrdinal":2,"numberOfChild":0,"columnProperties":{},"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":false}],"schemaEvolution":\{"schemaEvolutionEntryList":[{"timeStamp":1616425806915}]},"tableProperties":\{"indexexists":"false","sort_columns":"","comment":"","relatedmvtablesmap":"{\"default\":[\"mv_table\"]}","bad_record_path":"","local_dictionary_enable":"true","indextableexists":"false"}},"lastUpdatedTime":1616425806915,"tablePath":"file:/home/root1/carbondata/integration/spark/target/warehouse/new_table","isTransactionalTable":true,"hasColumnDrift":false,"isSchemaModified":false}')



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4155) CReate table like on table with MV fails

2021-03-23 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4155.
--
Fix Version/s: 2.1.1
   Resolution: Fixed

> CReate table like on table with MV fails 
> -
>
> Key: CARBONDATA-4155
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4155
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Assignee: Kunal Kapoor
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> steps to reproduce:
> {color:#067d17}create table maintable(name string, c_code int, price int) 
> STORED AS carbondata;{color}
> {color:#067d17}create materialized view mv_table as select name, sum(price) 
> from maintable group by name;{color}
> {color:#067d17}create table new_Table like maintable;{color}
> {color:#172b4d}Result:
> {color}
> 2021-03-22 20:40:06 ERROR CarbonCreateTableCommand:176 - 
> org.apache.spark.sql.AnalysisException: == Spark Parser: 
> org.apache.spark.sql.execution.SparkSqlParser ==
> extraneous input 'default' expecting \{')', ','}(line 8, pos 25)
> == SQL ==
> CREATE TABLE default.new_table
> (`name` string,`c_code` int,`price` int)
> USING carbondata
> OPTIONS (
>  indexexists "false",
>  sort_columns "",
>  comment "",
>  relatedmvtablesmap "\{"default":["mv_table"]}",
> -^^^
>  bad_record_path "",
>  local_dictionary_enable "true",
>  indextableexists "false",
>  tableName "new_table",
>  dbName "default",
>  tablePath 
> "/home/root1/carbondata/integration/spark/target/warehouse/new_table",
>  path 
> "file:/home/root1/carbondata/integration/spark/target/warehouse/new_table",
>  isExternal "false",
>  isTransactional "true",
>  isVisible "true"
>  ,carbonSchemaPartsNo '1',carbonSchema0 
> '\{"databaseName":"default","tableUniqueName":"default_new_table","factTable":{"tableId":"4ddbaea5-42b8-4ca2-b0ce-dec0af81d3b6","tableName":"new_table","listOfColumns":[{"dataType":{"id":0,"precedenceOrder":0,"name":"STRING","sizeInBytes":-1},"columnName":"name","columnUniqueId":"2293eee8-41fa-4869-8275-8c16a5dd7222","columnReferenceId":"2293eee8-41fa-4869-8275-8c16a5dd7222","isColumnar":true,"encodingList":[],"isDimensionColumn":true,"scale":-1,"precision":-1,"schemaOrdinal":0,"numberOfChild":0,"columnProperties":{},"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":true},\{"dataType":{"id":5,"precedenceOrder":3,"name":"INT","sizeInBytes":4},"columnName":"c_code","columnUniqueId":"cc3ab016-51e9-4791-8f37-8d697d972b8a","columnReferenceId":"cc3ab016-51e9-4791-8f37-8d697d972b8a","isColumnar":true,"encodingList":[],"isDimensionColumn":false,"scale":-1,"precision":-1,"schemaOrdinal":1,"numberOfChild":0,"columnProperties":{},"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":false},\{"dataType":{"id":5,"precedenceOrder":3,"name":"INT","sizeInBytes":4},"columnName":"price","columnUniqueId":"c67ed6d5-8f10-488f-a990-dfda20739907","columnReferenceId":"c67ed6d5-8f10-488f-a990-dfda20739907","isColumnar":true,"encodingList":[],"isDimensionColumn":false,"scale":-1,"precision":-1,"schemaOrdinal":2,"numberOfChild":0,"columnProperties":{},"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":false}],"schemaEvolution":\{"schemaEvolutionEntryList":[{"timeStamp":1616425806915}]},"tableProperties":\{"indexexists":"false","sort_columns":"","comment":"","relatedmvtablesmap":"{\"default\":[\"mv_table\"]}","bad_record_path":"","local_dictionary_enable":"true","indextableexists":"false"}},"lastUpdatedTime":1616425806915,"tablePath":"file:/home/root1/carbondata/integration/spark/target/warehouse/new_table","isTransactionalTable":true,"hasColumnDrift":false,"isSchemaModified":false}')



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CARBONDATA-4155) CReate table like on table with MV fails

2021-03-23 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor reassigned CARBONDATA-4155:


Assignee: Kunal Kapoor

> CReate table like on table with MV fails 
> -
>
> Key: CARBONDATA-4155
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4155
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Assignee: Kunal Kapoor
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> steps to reproduce:
> {color:#067d17}create table maintable(name string, c_code int, price int) 
> STORED AS carbondata;{color}
> {color:#067d17}create materialized view mv_table as select name, sum(price) 
> from maintable group by name;{color}
> {color:#067d17}create table new_Table like maintable;{color}
> {color:#172b4d}Result:
> {color}
> 2021-03-22 20:40:06 ERROR CarbonCreateTableCommand:176 - 
> org.apache.spark.sql.AnalysisException: == Spark Parser: 
> org.apache.spark.sql.execution.SparkSqlParser ==
> extraneous input 'default' expecting \{')', ','}(line 8, pos 25)
> == SQL ==
> CREATE TABLE default.new_table
> (`name` string,`c_code` int,`price` int)
> USING carbondata
> OPTIONS (
>  indexexists "false",
>  sort_columns "",
>  comment "",
>  relatedmvtablesmap "\{"default":["mv_table"]}",
> -^^^
>  bad_record_path "",
>  local_dictionary_enable "true",
>  indextableexists "false",
>  tableName "new_table",
>  dbName "default",
>  tablePath 
> "/home/root1/carbondata/integration/spark/target/warehouse/new_table",
>  path 
> "file:/home/root1/carbondata/integration/spark/target/warehouse/new_table",
>  isExternal "false",
>  isTransactional "true",
>  isVisible "true"
>  ,carbonSchemaPartsNo '1',carbonSchema0 
> '\{"databaseName":"default","tableUniqueName":"default_new_table","factTable":{"tableId":"4ddbaea5-42b8-4ca2-b0ce-dec0af81d3b6","tableName":"new_table","listOfColumns":[{"dataType":{"id":0,"precedenceOrder":0,"name":"STRING","sizeInBytes":-1},"columnName":"name","columnUniqueId":"2293eee8-41fa-4869-8275-8c16a5dd7222","columnReferenceId":"2293eee8-41fa-4869-8275-8c16a5dd7222","isColumnar":true,"encodingList":[],"isDimensionColumn":true,"scale":-1,"precision":-1,"schemaOrdinal":0,"numberOfChild":0,"columnProperties":{},"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":true},\{"dataType":{"id":5,"precedenceOrder":3,"name":"INT","sizeInBytes":4},"columnName":"c_code","columnUniqueId":"cc3ab016-51e9-4791-8f37-8d697d972b8a","columnReferenceId":"cc3ab016-51e9-4791-8f37-8d697d972b8a","isColumnar":true,"encodingList":[],"isDimensionColumn":false,"scale":-1,"precision":-1,"schemaOrdinal":1,"numberOfChild":0,"columnProperties":{},"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":false},\{"dataType":{"id":5,"precedenceOrder":3,"name":"INT","sizeInBytes":4},"columnName":"price","columnUniqueId":"c67ed6d5-8f10-488f-a990-dfda20739907","columnReferenceId":"c67ed6d5-8f10-488f-a990-dfda20739907","isColumnar":true,"encodingList":[],"isDimensionColumn":false,"scale":-1,"precision":-1,"schemaOrdinal":2,"numberOfChild":0,"columnProperties":{},"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":false}],"schemaEvolution":\{"schemaEvolutionEntryList":[{"timeStamp":1616425806915}]},"tableProperties":\{"indexexists":"false","sort_columns":"","comment":"","relatedmvtablesmap":"{\"default\":[\"mv_table\"]}","bad_record_path":"","local_dictionary_enable":"true","indextableexists":"false"}},"lastUpdatedTime":1616425806915,"tablePath":"file:/home/root1/carbondata/integration/spark/target/warehouse/new_table","isTransactionalTable":true,"hasColumnDrift":false,"isSchemaModified":false}')



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4153) DoNot Push down 'not equal to' filter with Cast on SI

2021-03-23 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4153.
--
Fix Version/s: 2.1.1
   Resolution: Fixed

> DoNot Push down 'not equal to' filter with Cast on SI
> -
>
> Key: CARBONDATA-4153
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4153
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> For NOT EQUAL TO filter on SI index column, should not be pushed down to SI 
> table.
> Currently, where x!='2' is not pushing down to SI, but where x!=2 is pushed 
> down to SI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4137) Refactor CarbonDataSourceScan without Spark Filter

2021-03-16 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4137.
--
Fix Version/s: 2.1.1
   Resolution: Fixed

> Refactor CarbonDataSourceScan without Spark Filter
> --
>
> Key: CARBONDATA-4137
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4137
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: David Cai
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4141) index server is not caching the index files from external sdk table.

2021-03-09 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4141.
--
Fix Version/s: (was: 2.0.1)
   2.1.1
   Resolution: Fixed

> index server is not caching the index files from external sdk table.
> 
>
> Key: CARBONDATA-4141
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4141
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 2.0.0
>Reporter: Karan
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Indexes cached in Executor cache are not dropped when drop table is called 
> for external SDK table. Because, external tables with sdk segments will not 
> have metadata like table status file. So in drop table command we send zero 
> segments to indexServer clearIndexes job, which clears nothing from executor 
> side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4121) Prepriming is not working in index server

2021-02-18 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4121.
--
Fix Version/s: (was: 2.0.1)
   2.1.1
   Resolution: Fixed

> Prepriming is not working in index server
> -
>
> Key: CARBONDATA-4121
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4121
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 2.0.0
>Reporter: Karan
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Prepriming is always executed in a async thread. Server.getRemoteUser in a 
> async thread causes NPE, which crashes the index server application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4126) Concurrent Compaction fails with Load on table with SI

2021-02-18 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4126.
--
Fix Version/s: 2.1.1
   Resolution: Fixed

> Concurrent Compaction fails with Load on table with SI
> --
>
> Key: CARBONDATA-4126
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4126
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 2.1.0
> Environment: Spark 2.4.5
>Reporter: Chetan Bhat
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> [Steps] :-
> Create table, load data and create SI.
> create table brinjal (imei string,AMSize string,channelsId 
> string,ActiveCountry string, Activecity string,gamePointId 
> double,deviceInformationId double,productionDate Timestamp,deliveryDate 
> timestamp,deliverycharge double) stored as carbondata 
> TBLPROPERTIES('table_blocksize'='1');
> LOAD DATA INPATH 'hdfs://hacluster/chetan/vardhandaterestruct.csv' INTO TABLE 
> brinjal OPTIONS('DELIMITER'=',', 'QUOTECHAR'= 
> '"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 
> 'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge');
> create index indextable1 ON TABLE brinjal (AMSize) AS 'carbondata';
>  
> From one terminal load data to table and other terminal perform minor and 
> major compaction on the table concurrently for some time.
> LOAD DATA INPATH 'hdfs://hacluster/chetan/vardhandaterestruct.csv' INTO TABLE 
> brinjal OPTIONS('DELIMITER'=',', 'QUOTECHAR'= 
> '"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 
> 'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge');
> alter table brinjal compact 'minor';
> alter table brinjal compact 'major';
>  
> [Expected Result] :-  Concurrent Compaction should be success with Load on 
> table with SI
>  
> [Actual Issue] : - Concurrent Compaction fails with Load on table with SI
> *0: jdbc:hive2://linux-32:22550/> alter table brinjal compact 'major';*
> *Error: org.apache.spark.sql.AnalysisException: Compaction failed. Please 
> check logs for more info. Exception in compaction Failed to acquire lock on 
> segment 2, during compaction of table test.brinjal; (state=,code=0)*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4123) Bloom index query with Index server giving incorrect results

2021-02-17 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4123.
--
Fix Version/s: 2.1.1
   Resolution: Fixed

> Bloom index query with Index server giving incorrect results
> 
>
> Key: CARBONDATA-4123
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4123
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Minor
> Fix For: 2.1.1
>
>
> Queries: create table and load data so that it can create >1 blocklet.
>  
> spark-sql> select count(*) from test_rcd where city = 'city40';
> 2021-02-04 22:13:29,759 | WARN | pool-24-thread-1 | It is not recommended to 
> set off-heap working memory size less than 512MB, so setting default value to 
> 512 | 
> org.apache.carbondata.core.memory.UnsafeMemoryManager.(UnsafeMemoryManager.java:83)
> 10
> Time taken: 2.417 seconds, Fetched 1 row(s)
> spark-sql> CREATE INDEX dm_rcd ON TABLE test_rcd (city) AS 'bloomfilter' 
> properties ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');
> 2021-02-04 22:13:58,683 | AUDIT | main | \{"time":"February 4, 2021 10:13:58 
> PM CST","username":"carbon","opName":"CREATE 
> INDEX","opId":"15148202700230273","opStatus":"START"} | 
> carbon.audit.logOperationStart(Auditor.java:74)
> 2021-02-04 22:13:58,759 | WARN | main | Bloom compress is not configured for 
> index dm_rcd, use default value true | 
> org.apache.carbondata.index.bloom.BloomCoarseGrainIndexFactory.validateAndGetBloomCompress(BloomCoarseGrainIndexFactory.java:202)
> 2021-02-04 22:13:59,292 | WARN | Executor task launch worker for task 2 | 
> Bloom compress is not configured for index dm_rcd, use default value true | 
> org.apache.carbondata.index.bloom.BloomCoarseGrainIndexFactory.validateAndGetBloomCompress(BloomCoarseGrainIndexFactory.java:202)
> 2021-02-04 22:13:59,629 | WARN | main | Bloom compress is not configured for 
> index dm_rcd, use default value true | 
> org.apache.carbondata.index.bloom.BloomCoarseGrainIndexFactory.validateAndGetBloomCompress(BloomCoarseGrainIndexFactory.java:202)
> 2021-02-04 22:14:00,331 | AUDIT | main | \{"time":"February 4, 2021 10:14:00 
> PM CST","username":"carbon","opName":"CREATE 
> INDEX","opId":"15148202700230273","opStatus":"SUCCESS","opTime":"1648 
> ms","table":"default.test_rcd","extraInfo":{"provider":"bloomfilter","indexName":"dm_rcd","bloom_size":"64","bloom_fpp":"0.1"}}
>  | carbon.audit.logOperationEnd(Auditor.java:97)
> Time taken: 1.818 seconds
> spark-sql> select count(*) from test_rcd where city = 'city40';
> 30
> Time taken: 0.556 seconds, Fetched 1 row(s)
> spark-sql>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4117) Test cg index query with Index server fails with NPE

2021-02-17 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4117.
--
Fix Version/s: 2.1.1
   Resolution: Fixed

> Test cg index query with Index server fails with NPE
> 
>
> Key: CARBONDATA-4117
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4117
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Test queries to execute:
> spark-sql> CREATE TABLE index_test_cg(id INT, name STRING, city STRING, age 
> INT) STORED AS carbondata TBLPROPERTIES('SORT_COLUMNS'='city,name', 
> 'SORT_SCOPE'='LOCAL_SORT');
> spark-sql> create index cgindex on table index_test_cg (name) as 
> 'org.apache.carbondata.spark.testsuite.index.CGIndexFactory';
> LOAD DATA LOCAL INPATH '$file2' INTO TABLE index_test_cg 
> OPTIONS('header'='false')
> spark-sql> select * from index_test_cg where name='n502670';
> 2021-01-29 15:09:25,881 | ERROR | main | Exception occurred while getting 
> splits using index server. Initiating Fallback to embedded mode | 
> org.apache.carbondata.hadoop.api.CarbonInputFormat.getDistributedSplit(CarbonInputFormat.java:454)
> java.lang.reflect.UndeclaredThrowableException
> at com.sun.proxy.$Proxy69.getSplits(Unknown Source)
> at 
> org.apache.carbondata.indexserver.DistributedIndexJob$$anonfun$1.apply(IndexJobs.scala:85)
> at 
> org.apache.carbondata.indexserver.DistributedIndexJob$$anonfun$1.apply(IndexJobs.scala:59)
> at 
> org.apache.carbondata.spark.util.CarbonScalaUtil$.logTime(CarbonScalaUtil.scala:769)
> at 
> org.apache.carbondata.indexserver.DistributedIndexJob.execute(IndexJobs.scala:58)
> at 
> org.apache.carbondata.core.index.IndexUtil.executeIndexJob(IndexUtil.java:307)
> at 
> org.apache.carbondata.hadoop.api.CarbonInputFormat.getDistributedSplit(CarbonInputFormat.java:443)
> at 
> org.apache.carbondata.hadoop.api.CarbonInputFormat.getPrunedBlocklets(CarbonInputFormat.java:555)
> at 
> org.apache.carbondata.hadoop.api.CarbonInputFormat.getDataBlocksOfSegment(CarbonInputFormat.java:500)
> at 
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:357)
> at 
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:205)
> at 
> org.apache.carbondata.spark.rdd.CarbonScanRDD.internalGetPartitions(CarbonScanRDD.scala:159)
> at org.apache.carbondata.spark.rdd.CarbonRDD.getPartitions(CarbonRDD.scala:68)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:269)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2299)
> at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:989)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:384)
> at org.apache.spark.rdd.RDD.collect(RDD.scala:988)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:345)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:372)
> at 
> org.apache.spark.sql.execution.QueryExecution.hiveResultString(QueryExecution.scala:127)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver$$anonfun$run$1.apply(SparkSQLDriver.scala:66)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver$$anonfun$run$1.apply(SparkSQLDriver.scala:66)
> at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:95)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:144)
> at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:86)
> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:789)
> at 
> 

[jira] [Resolved] (CARBONDATA-4082) When a segment is added to a carbon table by alter table add segment query and that segment also have a deleteDelta file present in it then on querying the carbon t

2021-02-04 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4082.
--
Fix Version/s: 2.1.1
   Resolution: Fixed

> When a segment is added to a carbon table by alter table add segment query 
> and that segment also have a deleteDelta file present in it then on querying 
> the carbon table the deleted rows are coming in the result.
> ---
>
> Key: CARBONDATA-4082
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4082
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Karan
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> When a segment is added to a carbon table by alter table add segment query 
> and that segment also have a deleteDelta file present in it then on querying 
> the carbon table the deleted rows are coming in the result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4113) Partition query results invalid when carbon.read.partition.hive.direct is disabled

2021-02-02 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4113.
--
Fix Version/s: 2.1.1
   Resolution: Fixed

> Partition query results invalid when carbon.read.partition.hive.direct is 
> disabled
> --
>
> Key: CARBONDATA-4113
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4113
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> set 'carbon.read.partition.hive.direct' to false.
> queries to execute:
> create table partition_cache(a string) partitioned by(b int) stored as 
> carbondata
> insert into partition_cache select 'k',1;
> insert into partition_cache select 'k',1;
> insert into partition_cache select 'k',2;
> insert into partition_cache select 'k',2;
> alter table partition_cache compact 'minor';
> select *from partition_cache; => no results



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4097) Direct filling of column vector is not allowed for a alter table, because it uses RestructureBasedCollector. However ColumnVectors were initialized as ColumnVectorW

2021-01-26 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4097.
--
Fix Version/s: (was: 2.0.1)
   2.1.1
   Resolution: Fixed

> Direct filling of column vector is not allowed for a alter table, because it 
> uses RestructureBasedCollector. However ColumnVectors were initialized as 
> ColumnVectorWrapperDirect even for alter table.
> --
>
> Key: CARBONDATA-4097
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4097
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0
>Reporter: Karan
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> ColumnVector for alter tables should not be initialized as 
> ColumnVectorWrapperDirect because direct filling is not allowed for alter 
> table. It should be initialized as ColumnVectorWrapper.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4096) SDK read fails from cluster and sdk read filter query on sort column giving wrong result with IndexServer

2021-01-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4096.
--
Fix Version/s: 2.1.1
   Resolution: Fixed

> SDK read fails from cluster and sdk read filter query on sort column giving 
> wrong result with IndexServer
> -
>
> Key: CARBONDATA-4096
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4096
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Minor
> Fix For: 2.1.1
>
> Attachments: image-2020-12-22-18-54-52-361.png, 
> wrongresults_with_IS.PNG
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Test write sdk and read with spark.
> Queries to reproduce:
> put written sdk files in $warehouse/sdk path - contains .carbondata and 
> .index files.
> +From spark-sql:+ 
> create table sdkout using carbon options(path='$warehouse/sdk');
> select * from sdkout where salary = 100; 
> !image-2020-12-22-18-54-52-361.png|width=744,height=279!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4055) Empty segment created and unnecessary entry to table status in update

2021-01-20 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4055.
--
Fix Version/s: 2.1.1
   Resolution: Fixed

> Empty segment created and unnecessary entry to table status in update
> -
>
> Key: CARBONDATA-4055
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4055
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Akash R Nilugal
>Assignee: Akash R Nilugal
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> When the update command is executed and no data is updated, empty segment 
> directories are created and an in progress stale entry added to table status, 
> and even segment dirs are not cleaned during clean files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3898) Support Option 'carbon.enable.querywithmv'

2021-01-03 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-3898:
-
Issue Type: Improvement  (was: New Feature)

> Support Option 'carbon.enable.querywithmv'
> --
>
> Key: CARBONDATA-3898
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3898
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Xingjun Hao
>Priority: Minor
> Fix For: 2.1.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> When MV enabled, SQL rewrite takes a lot of time, a new option 
> 'carbon.enable.querywithmv' shall be supported, which can turn off SQL 
> Rewrite when the configured value is false



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3854) Quote char support to unprintable character like \u0009 \u0010

2021-01-03 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-3854:
-
Issue Type: Bug  (was: New Feature)

> Quote char support to unprintable character like \u0009 \u0010
> --
>
> Key: CARBONDATA-3854
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3854
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Mahesh Raju Somalaraju
>Priority: Minor
> Fix For: 2.1.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Quote char support to unprintable character like \u0009 \u0010
> Currently carbondata will not support setting quotechar to printable char 
> like \u0009.
> current behaviour is quotechar will through exception if we give more than 
> one character.
>  
> Need to support more than one character same as like delimiter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3854) Quote char support to unprintable character like \u0009 \u0010

2021-01-03 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-3854.
--
Resolution: Fixed

> Quote char support to unprintable character like \u0009 \u0010
> --
>
> Key: CARBONDATA-3854
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3854
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Mahesh Raju Somalaraju
>Priority: Minor
> Fix For: 2.1.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Quote char support to unprintable character like \u0009 \u0010
> Currently carbondata will not support setting quotechar to printable char 
> like \u0009.
> current behaviour is quotechar will through exception if we give more than 
> one character.
>  
> Need to support more than one character same as like delimiter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (CARBONDATA-3854) Quote char support to unprintable character like \u0009 \u0010

2021-01-03 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor reopened CARBONDATA-3854:
--

> Quote char support to unprintable character like \u0009 \u0010
> --
>
> Key: CARBONDATA-3854
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3854
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Mahesh Raju Somalaraju
>Priority: Minor
> Fix For: 2.1.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Quote char support to unprintable character like \u0009 \u0010
> Currently carbondata will not support setting quotechar to printable char 
> like \u0009.
> current behaviour is quotechar will through exception if we give more than 
> one character.
>  
> Need to support more than one character same as like delimiter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3820) Fix CDC failure when sort columns present in source dataframe

2021-01-03 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-3820:
-
Issue Type: Bug  (was: New Feature)

> Fix CDC failure when sort columns present in source dataframe
> -
>
> Key: CARBONDATA-3820
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3820
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Xingjun Hao
>Priority: Major
> Fix For: 2.1.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> If there is GloabalSort table in the CDC Flow. The following exception will
> be throwed:
> Exception in thread "main" java.lang.RuntimeException: column: id specified
> in sort columns does not exist in schema
>         at
> org.apache.carbondata.sdk.file.CarbonWriterBuilder.buildTableSchema(CarbonWriterBuilder.java:828)
>         at
> org.apache.carbondata.sdk.file.CarbonWriterBuilder.buildCarbonTable(CarbonWriterBuilder.java:794)
>         at
> org.apache.carbondata.sdk.file.CarbonWriterBuilder.buildLoadModel(CarbonWriterBuilder.java:720)
>         at
> org.apache.spark.sql.carbondata.execution.datasources.CarbonSparkDataSourceUtil$.prepareLoadModel(CarbonSparkDataSourceUtil.scala:281)
>         at
> org.apache.spark.sql.carbondata.execution.datasources.SparkCarbonFileFormat.prepareWrite(SparkCarbonFileFormat.scala:141)
>         at
> org.apache.spark.sql.execution.command.mutation.merge.CarbonMergeDataSetCommand.processIUD(CarbonMergeDataSetCommand.scala:269)
>         at
> org.apache.spark.sql.execution.command.mutation.merge.CarbonMergeDataSetCommand.processData(CarbonMergeDataSetCommand.scala:152)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4080) Wrong results for select count on invalid segments

2020-12-17 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4080.
--
Fix Version/s: 2.1.1
   Resolution: Fixed

> Wrong results for select count on invalid segments
> --
>
> Key: CARBONDATA-4080
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4080
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Reporter: Akshay
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Wrong results for
>  * select count on marked for delete segment
>  * select count on compacted segment
> Issue comes only when the user explicitly sets deleted/compacted segments 
> using the property carbon.input.segments.
> As select * on such segments gives 0 rows as output, in order to maintain 
> consistency, select count should also give 0 rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4077) Insert into partition with FileMergeSortComparator is failing with NPE

2020-12-17 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4077.
--
Fix Version/s: (was: 2.2.0)
   2.1.1
   Resolution: Fixed

> Insert into partition with FileMergeSortComparator is failing with NPE
> --
>
> Key: CARBONDATA-4077
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4077
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthu Murugesh
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4050) TPC-DS queries performance degraded when compared to older versions due to redundant getFileStatus() invocations

2020-12-03 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4050.
--
Resolution: Fixed

> TPC-DS queries performance degraded when compared to older versions due to 
> redundant getFileStatus() invocations
> 
>
> Key: CARBONDATA-4050
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4050
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Venugopal Reddy K
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> *Issue:*
> In createCarbonDataFileBlockMetaInfoMapping method, we get list of carbondata 
> files in the segment, loop through all the carbon files and make a map of 
> fileNameToMetaInfoMapping
>       In that carbon files loop, if the file is of AbstractDFSCarbonFile 
> type, we get the org.apache.hadoop.fs.FileStatus thrice for each file. And 
> the method to get file status is an RPC call(fileSystem.getFileStatus(path)). 
> It takes ~2ms in the cluster for each call. Thus, incur an overhead of ~6ms 
> per file. So overall driver side query processing time has increased 
> significantly when there are more carbon files. Hence caused TPC-DS queries 
> performance degradation.
> Have shown the methods/calls which get the file status for the carbon file in 
> loop:
> {code:java}
> public static Map 
> createCarbonDataFileBlockMetaInfoMapping(
> String segmentFilePath, Configuration configuration) throws IOException {
>   Map fileNameToMetaInfoMapping = new TreeMap();
>   CarbonFile carbonFile = FileFactory.getCarbonFile(segmentFilePath, 
> configuration);
>   if (carbonFile instanceof AbstractDFSCarbonFile && !(carbonFile instanceof 
> S3CarbonFile)) {
> PathFilter pathFilter = new PathFilter() {
>   @Override
>   public boolean accept(Path path) {
> return CarbonTablePath.isCarbonDataFile(path.getName());
>   }
> };
> CarbonFile[] carbonFiles = carbonFile.locationAwareListFiles(pathFilter);
> for (CarbonFile file : carbonFiles) {
>   String[] location = file.getLocations(); // RPC call - 1
>   long len = file.getSize(); // RPC call - 2
>   BlockMetaInfo blockMetaInfo = new BlockMetaInfo(location, len);
>   fileNameToMetaInfoMapping.put(file.getPath(), blockMetaInfo); // RPC 
> call - 3 in file.getpath() method
> }
>   }
>   return fileNameToMetaInfoMapping;
> }
> {code}
>  
> *Suggestion:*
> I think, currently we make RPC call to get the file status upon each 
> invocation because file status may change over a period of time. And we 
> shouldn't cache the file status in AbstractDFSCarbonFile.
>      In the current case, just before the loop of carbon files, we get the 
> file status of all the carbon files in the segment with RPC call shown below. 
> LocatedFileStatus is a child class of FileStatus. It has BlockLocation along 
> with file status. 
> {code:java}
> RemoteIterator iter = 
> fileSystem.listLocatedStatus(path);{code}
>         Intention of getting all the file status here is to create instance 
> of BlockMetaInfo and maintain the map of fileNameToMetaInfoMapping.
> So it is safe to avoid these unnecessary rpc calls to get file status again 
> in getLocations(), getSize() and getPath() methods.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4046) Select count(*) fails on partition table.

2020-12-03 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4046.
--
Fix Version/s: 2.1.1
   Resolution: Fixed

> Select count(*) fails on partition table.
> -
>
> Key: CARBONDATA-4046
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4046
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Nihal kumar ojha
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Steps to reproduce
> 1. set property `carbon.read.partition.hive.direct=false`
> 2. Create table which contain more than one partition column.
> 3. run query select count (*)
>  
> It fails with exception as `Key not found`.
>  
> create table partition_cache(a string) partitioned by(b int, c String) stored 
> as carbondata;
> insert into partition_cache select 'k',1,'nihal';
> select count(*) from partition_cache where b = 1;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4022) Getting the error - "PathName is not a valid DFS filename." with index server and after adding carbon SDK segments and then doing select/update/delete operations.

2020-12-03 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4022.
--
Fix Version/s: 2.1.1
   Resolution: Fixed

> Getting the error - "PathName is not a valid DFS filename." with index server 
> and after adding carbon SDK segments and then doing select/update/delete 
> operations.
> --
>
> Key: CARBONDATA-4022
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4022
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Prasanna Ravichandran
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
>  Getting this error - "PathName is not a valid DFS filename." during the 
> update/delete/select queries on a added SDK segment table. Also the path 
> represented in the error is not proper, which is the cause of error. This is 
> seen only when index server is running and disable fallback is true.
> Queries and errors:
> > create table sdk_2level_1(name string, rec1 
> > struct>) stored as carbondata;
> +-+
> | Result |
> +-+
> +-+
> No rows selected (0.425 seconds)
> > alter table sdk_2level_1 add segment 
> > options('path'='hdfs://hacluster/sdkfiles/twolevelnestedrecwitharray','format'='carbondata');
> +-+
> | Result |
> +-+
> +-+
> No rows selected (0.77 seconds)
> > select * from sdk_2level_1;
> INFO : Execution ID: 1855
> Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 600.0 failed 4 times, most recent failure: Lost task 0.3 in 
> stage 600.0 (TID 21345, linux, executor 16): 
> java.lang.IllegalArgumentException: Pathname 
> /user/hive/warehouse/carbon.store/rps/sdk_2level_1hdfs:/hacluster/sdkfiles/twolevelnestedrecwitharray/part-0-188852617294480_batchno0-0-null-188852332673632.carbondata
>  from 
> hdfs://hacluster/user/hive/warehouse/carbon.store/rps/sdk_2level_1hdfs:/hacluster/sdkfiles/twolevelnestedrecwitharray/part-0-188852617294480_batchno0-0-null-188852332673632.carbondata
>  is not a valid DFS filename.
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:249)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:332)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:328)
>  at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:340)
>  at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:955)
>  at 
> org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getDataInputStream(AbstractDFSCarbonFile.java:316)
>  at 
> org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getDataInputStream(AbstractDFSCarbonFile.java:293)
>  at 
> org.apache.carbondata.core.datastore.impl.FileFactory.getDataInputStream(FileFactory.java:198)
>  at 
> org.apache.carbondata.core.datastore.impl.FileFactory.getDataInputStream(FileFactory.java:188)
>  at org.apache.carbondata.core.reader.ThriftReader.open(ThriftReader.java:100)
>  at 
> org.apache.carbondata.core.reader.CarbonHeaderReader.readHeader(CarbonHeaderReader.java:60)
>  at 
> org.apache.carbondata.core.util.DataFileFooterConverterV3.readDataFileFooter(DataFileFooterConverterV3.java:65)
>  at 
> org.apache.carbondata.core.util.CarbonUtil.getDataFileFooter(CarbonUtil.java:902)
>  at 
> org.apache.carbondata.core.util.CarbonUtil.readMetadataFile(CarbonUtil.java:874)
>  at 
> org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getDataBlocks(AbstractQueryExecutor.java:216)
>  at 
> org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:138)
>  at 
> org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:382)
>  at 
> org.apache.carbondata.core.scan.executor.impl.DetailQueryExecutor.execute(DetailQueryExecutor.java:47)
>  at 
> org.apache.carbondata.hadoop.CarbonRecordReader.initialize(CarbonRecordReader.java:117)
>  at 
> org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:540)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>  at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:584)
>  at 
> 

[jira] [Updated] (CARBONDATA-3875) Support show segments include stage

2020-11-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-3875:
-
Fix Version/s: (was: 2.0.2)
   2.1.1

> Support show segments include stage
> ---
>
> Key: CARBONDATA-3875
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3875
> Project: CarbonData
>  Issue Type: New Feature
>  Components: spark-integration
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Xingjun Hao
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> There is a lack of monitoring of the stage information in the current system, 
> 'Show segments include stage' command shall be supported. which will provide 
> monitoring information, such as createTime, partitioninfo, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3856) Support the LIMIT operator for show segments command

2020-11-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-3856:
-
Fix Version/s: (was: 2.0.2)
   2.1.1

> Support the LIMIT operator for show segments command
> 
>
> Key: CARBONDATA-3856
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3856
> Project: CarbonData
>  Issue Type: New Feature
>  Components: spark-integration
>Affects Versions: 2.0.0
>Reporter: Xingjun Hao
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Now, in the 2.0.0 release, CarbonData doesn't support LIMIT operator in the 
> SHOW SEGMENTS command. The time cost is expensive when there are too many 
> segments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3670) Support compress offheap columnpage directly, avoding a copy of data from offhead to heap when compressed.

2020-11-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-3670:
-
Fix Version/s: (was: 2.1.0)
   2.1.1

> Support compress offheap columnpage directly, avoding a copy of data from 
> offhead to heap when compressed.
> --
>
> Key: CARBONDATA-3670
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3670
> Project: CarbonData
>  Issue Type: Wish
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Xingjun Hao
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When writing data, the columnpages are stored on the offheap,  the pages will 
> be compressed to save storage cost. Now, in the compression processing, the 
> data will be copied from the offheap to the heap before compressed, which 
> leads to heavier GC overhead compared with compress offhead directly.
> To sum up, we support compress offheap columnpage directly, avoding a copy of 
> data from offhead to heap when compressed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3615) Show metacache shows the index server index-dictionary files when data loaded after index server disabled using set command

2020-11-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-3615:
-
Fix Version/s: (was: 2.1.0)
   2.1.1

> Show metacache shows the index server index-dictionary files when data loaded 
> after index server disabled using set command
> ---
>
> Key: CARBONDATA-3615
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3615
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Vikram Ahuja
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Show metacache shows the index server index-dictionary files when data loaded 
> after index server disabled using set command
> +-+-+-+-+--+
> |    Field    |  Size   |         Comment         | Cache Location  |
> +-+-+-+-+--+
> | Index       | 0 B     | 0/2 index files cached  | DRIVER          |
> | Dictionary  | 0 B     |                         | DRIVER          |
> *| Index       | 1.5 KB  | 2/2 index files cached  | INDEX SERVER    |*
> *| Dictionary  | 0 B     |                         | INDEX SERVER    |*
> *+-+-+-+*-+--+



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3608) Drop 'STORED BY' syntax in create table

2020-11-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-3608:
-
Fix Version/s: (was: 2.1.0)
   2.1.1

> Drop 'STORED BY' syntax in create table
> ---
>
> Key: CARBONDATA-3608
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3608
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Jacky Li
>Priority: Major
> Fix For: 2.1.1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3816) Support Float and Decimal in the Merge Flow

2020-11-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-3816:
-
Fix Version/s: (was: 2.1.0)
   2.1.1

> Support Float and Decimal in the Merge Flow
> ---
>
> Key: CARBONDATA-3816
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3816
> Project: CarbonData
>  Issue Type: New Feature
>  Components: data-load
>Affects Versions: 2.0.0
>Reporter: Xingjun Hao
>Priority: Major
> Fix For: 2.1.1
>
>
> We don't support FLOAT and DECIMAL datatype in the CDC Flow. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3746) Support column chunk cache creation and basic read/write

2020-11-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-3746:
-
Fix Version/s: (was: 2.1.0)
   2.1.1

> Support column chunk cache creation and basic read/write
> 
>
> Key: CARBONDATA-3746
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3746
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Jacky Li
>Assignee: Jacky Li
>Priority: Major
> Fix For: 2.1.1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4026) Thread leakage while Loading

2020-11-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-4026:
-
Fix Version/s: (was: 2.1.0)
   2.1.1

> Thread leakage while Loading
> 
>
> Key: CARBONDATA-4026
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4026
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 2.0.1
>Reporter: Xingjun Hao
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> A few code of Inserting/Loading/InsertStage/IndexServer won't shutdown 
> executorservice. leads to thread leakage which will degrade the performance 
> of the driver and executor. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4003) Improve IUD Concurrency

2020-11-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-4003:
-
Fix Version/s: (was: 2.1.0)
   2.1.1

> Improve IUD Concurrency
> ---
>
> Key: CARBONDATA-4003
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4003
> Project: CarbonData
>  Issue Type: Improvement
>  Components: spark-integration
>Affects Versions: 2.0.1
>Reporter: Kejian Li
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 18h 10m
>  Remaining Estimate: 0h
>
> When some segments' state of the table is INSERT IN PROGRESS, update 
> operation on the table fails.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4008) IN filter on date column is returning 0 results when 'carbon.push.rowfilters.for.vector' is true

2020-11-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-4008:
-
Fix Version/s: (was: 2.1.0)
   2.1.1

> IN filter on date column is returning 0 results when 
> 'carbon.push.rowfilters.for.vector' is true
> 
>
> Key: CARBONDATA-4008
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4008
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Venugopal Reddy K
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> *Issue:*
> IN filter with date column in condition is returning 0 results when 
> 'carbon.push.rowfilters.for.vector' is set to true.
>  
> *Steps to reproduce:*
> sql("set carbon.push.rowfilters.for.vector=true")
> sql("create table test_table(i int, dt date, ts timestamp) stored as 
> carbondata")
> sql("insert into test_table select 1, '2020-03-30', '2020-03-30 10:00:00'")
> sql("insert into test_table select 2, '2020-07-04', '2020-07-04 14:12:15'")
> sql("insert into test_table select 3, '2020-09-23', '2020-09-23 12:30:45'")
> sql("select * from test_table where dt IN ('2020-03-30', 
> '2020-09-23')").show()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4050) TPC-DS queries performance degraded when compared to older versions due to redundant getFileStatus() invocations

2020-11-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-4050:
-
Fix Version/s: (was: 2.1.0)
   2.1.1

> TPC-DS queries performance degraded when compared to older versions due to 
> redundant getFileStatus() invocations
> 
>
> Key: CARBONDATA-4050
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4050
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.0
>Reporter: Venugopal Reddy K
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> *Issue:*
> In createCarbonDataFileBlockMetaInfoMapping method, we get list of carbondata 
> files in the segment, loop through all the carbon files and make a map of 
> fileNameToMetaInfoMapping
>       In that carbon files loop, if the file is of AbstractDFSCarbonFile 
> type, we get the org.apache.hadoop.fs.FileStatus thrice for each file. And 
> the method to get file status is an RPC call(fileSystem.getFileStatus(path)). 
> It takes ~2ms in the cluster for each call. Thus, incur an overhead of ~6ms 
> per file. So overall driver side query processing time has increased 
> significantly when there are more carbon files. Hence caused TPC-DS queries 
> performance degradation.
> Have shown the methods/calls which get the file status for the carbon file in 
> loop:
> {code:java}
> public static Map 
> createCarbonDataFileBlockMetaInfoMapping(
> String segmentFilePath, Configuration configuration) throws IOException {
>   Map fileNameToMetaInfoMapping = new TreeMap();
>   CarbonFile carbonFile = FileFactory.getCarbonFile(segmentFilePath, 
> configuration);
>   if (carbonFile instanceof AbstractDFSCarbonFile && !(carbonFile instanceof 
> S3CarbonFile)) {
> PathFilter pathFilter = new PathFilter() {
>   @Override
>   public boolean accept(Path path) {
> return CarbonTablePath.isCarbonDataFile(path.getName());
>   }
> };
> CarbonFile[] carbonFiles = carbonFile.locationAwareListFiles(pathFilter);
> for (CarbonFile file : carbonFiles) {
>   String[] location = file.getLocations(); // RPC call - 1
>   long len = file.getSize(); // RPC call - 2
>   BlockMetaInfo blockMetaInfo = new BlockMetaInfo(location, len);
>   fileNameToMetaInfoMapping.put(file.getPath(), blockMetaInfo); // RPC 
> call - 3 in file.getpath() method
> }
>   }
>   return fileNameToMetaInfoMapping;
> }
> {code}
>  
> *Suggestion:*
> I think, currently we make RPC call to get the file status upon each 
> invocation because file status may change over a period of time. And we 
> shouldn't cache the file status in AbstractDFSCarbonFile.
>      In the current case, just before the loop of carbon files, we get the 
> file status of all the carbon files in the segment with RPC call shown below. 
> LocatedFileStatus is a child class of FileStatus. It has BlockLocation along 
> with file status. 
> {code:java}
> RemoteIterator iter = 
> fileSystem.listLocatedStatus(path);{code}
>         Intention of getting all the file status here is to create instance 
> of BlockMetaInfo and maintain the map of fileNameToMetaInfoMapping.
> So it is safe to avoid these unnecessary rpc calls to get file status again 
> in getLocations(), getSize() and getPath() methods.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3643) Insert array('')/array() into Struct column will result in array(null), which is inconsist with Parquet

2020-11-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-3643:
-
Fix Version/s: (was: 2.1.0)
   2.1.1

> Insert array('')/array() into Struct column will result in 
> array(null), which is inconsist with Parquet
> --
>
> Key: CARBONDATA-3643
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3643
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Xingjun Hao
>Priority: Minor
> Fix For: 2.1.1
>
>
>  
> {code:java}
> //
> sql("create table datatype_struct_parquet(price struct>) 
> stored as parquet") 
> sql("insert into table datatype_struct_parquet values(named_struct('b', 
> array('')))") 
> sql("create table datatype_struct_carbondata(price struct>) 
> stored as carbondata") 
> sql("insert into datatype_struct_carbondata select * from 
> datatype_struct_parquet")
> checkAnswer( sql("SELECT * FROM datatype_struct_carbondata"), sql("SELECT * 
> FROM datatype_struct_parquet"))
> !== Correct Answer - 1 == == Spark Answer - 1 == 
> ![[WrappedArray()]] [[WrappedArray(null)]]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3617) loadDataUsingGlobalSort should based on SortColumns Instead Of Whole CarbonRow

2020-11-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-3617:
-
Fix Version/s: (was: 2.1.0)
   2.1.1

> loadDataUsingGlobalSort should based on SortColumns Instead Of Whole CarbonRow
> --
>
> Key: CARBONDATA-3617
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3617
> Project: CarbonData
>  Issue Type: Improvement
>  Components: data-load
>Affects Versions: 1.6.1, 2.0.0
>Reporter: Xingjun Hao
>Priority: Minor
> Fix For: 2.1.1
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> During loading Data usesing globalsort, the sortby processing is based the 
> whole carbon row, the overhead of gc is huge when there are many columns. 
> Theoretically, the sortby processing can works well just based on the sort 
> columns, which will brings less time overhead and gc overhead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3880) How to start JDBC service in distributed index

2020-11-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-3880:
-
Fix Version/s: (was: 2.1.0)
   2.1.1

>  How to start JDBC service in distributed index
> ---
>
> Key: CARBONDATA-3880
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3880
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.0.0
>Reporter: li
>Priority: Major
> Fix For: 2.1.1
>
>
> How to start JDBC service in distributed index



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4031) Query result is incorrect after Delete and Insert overwrite

2020-11-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-4031:
-
Fix Version/s: (was: 2.1.0)
   2.1.1

> Query result is incorrect after Delete and Insert overwrite
> ---
>
> Key: CARBONDATA-4031
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4031
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 2.0.0
>Reporter: Kejian Li
>Priority: Critical
> Fix For: 2.1.1
>
> Attachments: s_x034_carbon-07.csv, s_x034_carbon-08.csv
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> There is a table with two partitions. User deletes some records on one of 
> partitions and then inserts overwrite the other partition. Deleted records on 
> the previous partition comes back.
> 1.  CREATE TABLE s_x034_carbon (guid STRING, sales_guid STRING) PARTITIONED 
> BY (dt STRING) STORED AS carbondata;
> 2. load data local inpath 
> '/home/lizi/Workspace/carbondata_test_workspace/data/s_x034_carbon-07.csv' 
> into table s_x034_carbon;
>  load data local inpath 
> '/home/lizi/Workspace/carbondata_test_workspace/data/s_x034_carbon-08.csv' 
> into table s_x034_carbon;
> 3. select count(1), dt from s_x034_carbon group by dt;
> 4. select * from s_x034_carbon where dt=20200907 limit 5;
> 5. delete from s_x034_carbon where dt= 20200907 and 
> guid='595E1862D81A09D0E1008000AC1E0124';
> delete from s_x034_carbon where dt= 20200907 and 
> guid='005056AF06441EDA89ABF853E435A6BD';
> 6. select count(1), dt from s_x034_carbon group by dt;
> 7. insert overwrite table s_x034_carbon partition (dt=20200908)
>  select a.guid as guid, a.sales_guid as sales_guid from s_x034_carbon a 
>  where dt = 20200907;
> 8. select count(1), dt from s_x034_carbon group by dt;
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4032) Drop partition command clean other partition dictionaries

2020-11-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-4032:
-
Fix Version/s: (was: 2.1.0)
   2.1.1

> Drop partition command clean other partition dictionaries
> -
>
> Key: CARBONDATA-4032
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4032
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.0.1
>Reporter: Xingjun Hao
>Priority: Critical
> Fix For: 2.1.1
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> 1. CREATE TABLE droppartition (id STRING, sales STRING) PARTITIONED BY (dtm 
> STRING)STORED AS carbondata
> 2. insert into droppartition values ('01', '0', '20200907'),('03', '0', 
> '20200908'),
> 3. insert overwrite table droppartition partition (dtm=20200908) select * 
> from droppartition where dtm = 20200907;
> insert overwrite table droppartition partition (dtm=20200909) select * from 
> droppartition where dtm = 20200907;
> 4. alter table droppartition drop partition (dtm=20200909)
> the dirctionary "20200908" was deleted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3559) Support adding carbon file into CarbonData table

2020-11-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-3559:
-
Fix Version/s: (was: 2.1.0)
   2.1.1

> Support adding carbon file into CarbonData table
> 
>
> Key: CARBONDATA-3559
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3559
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jacky Li
>Assignee: Jacky Li
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Since adding parquet/orc files into CarbonData table are supported now, 
> adding carbon files should be supported as well



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3603) Feature Change in CarbonData 2.0

2020-11-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-3603:
-
Fix Version/s: (was: 2.1.0)
   2.1.1

> Feature Change in CarbonData 2.0
> 
>
> Key: CARBONDATA-3603
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3603
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jacky Li
>Priority: Major
> Fix For: 2.1.1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3370) fix missing version of maven-duplicate-finder-plugin

2020-11-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-3370:
-
Fix Version/s: (was: 2.1.0)
   2.1.1

> fix missing version of maven-duplicate-finder-plugin
> 
>
> Key: CARBONDATA-3370
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3370
> Project: CarbonData
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 1.5.3
>Reporter: lamber-ken
>Priority: Critical
> Fix For: 2.1.1
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> fix missing version of maven-duplicate-finder-plugin in pom file



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3991) File system could not set modified time because don't override the settime function

2020-11-19 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor updated CARBONDATA-3991:
-
Fix Version/s: (was: 2.0.1)
   2.1.1

> File system could not set modified time because don't override the settime 
> function
> ---
>
> Key: CARBONDATA-3991
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3991
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.0.1
>Reporter: jingpan xiong
>Priority: Major
> Fix For: 2.1.1
>
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> The file system like S3 and Alluxio, don't override the settime function, 
> cause the updata and create mv got some problem. This bug can't raise a 
> exception on set modified time, and may set a null value in modified time. 
> This bug may cause multi tenant problem and data consistency problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4006) Get count method of Index server gives currentUser as NULL in fallback mode, can later lead to Null pointer exception

2020-11-02 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4006.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> Get count method of Index server gives currentUser as NULL in fallback mode, 
> can later lead to Null pointer exception
> -
>
> Key: CARBONDATA-4006
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4006
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vikram Ahuja
>Priority: Minor
> Fix For: 2.1.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4043) Fix data load failure issue for columns added in legacy store

2020-11-02 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4043.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> Fix data load failure issue for columns added in legacy store
> -
>
> Key: CARBONDATA-4043
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4043
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
> Fix For: 2.1.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> h3. When dimension is added in older versions like 1.1, by default it will be 
> sort column. In sort step we assume data will be coming as sort column in the 
> beginning. But the added column will be at last eventhough sort column. So, 
> while building the dataload configurations for loading data, we rearrange the 
> columns(dimensions and datafields) in order to bring the sort column to 
> beginning and no-sort to last and revert them back to schema order before 
> FinalMerge/DataWriter step.
> Issue:
>  Data loading is failing because of castException in data writing step in 
> case of NO_SORT and in final sort step in case of LOCAL_SORT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4007) ArrayIndexOutofBoundsException when IUD operations performed using SDK

2020-10-27 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4007.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> ArrayIndexOutofBoundsException when IUD operations performed using SDK
> --
>
> Key: CARBONDATA-4007
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4007
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 2.1.0
> Environment: Spark 2.4.5 jars used for compilation of SDK 
>Reporter: Chetan Bhat
>Priority: Major
> Fix For: 2.1.0
>
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> Issue -
> ArrayIndexOutofBoundsException when IUD operations performed using SDK.
> Exception -
> java.lang.ArrayIndexOutOfBoundsException: 1
>  at 
> org.apache.carbondata.hadoop.api.CarbonTableOutputFormat$1.close(CarbonTableOutputFormat.java:579)
>  at org.apache.carbondata.sdk.file.CarbonIUD.delete(CarbonIUD.java:110)
>  at 
> org.apache.carbondata.sdk.file.CarbonIUD.deleteExecution(CarbonIUD.java:238)
>  at org.apache.carbondata.sdk.file.CarbonIUD.closeDelete(CarbonIUD.java:123)
>  at org.apache.carbondata.sdk.file.CarbonIUD.commit(CarbonIUD.java:221)
>  at com.apache.spark.SdkIUD_Test.testDelete(SdkIUD_Test.java:130)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>  at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>  at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>  at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>  at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>  at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>  at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>  at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>  at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>  at 
> com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
>  at 
> com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
>  at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3979) Added Hive local dictionary support example

2020-10-23 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-3979.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> Added Hive local dictionary support example
> ---
>
> Key: CARBONDATA-3979
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3979
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SHREELEKHYA GAMPA
>Priority: Minor
> Fix For: 2.1.0
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
>  To verify local dictionary support in hive for the carbon tables created 
> from spark.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3999) The permission of IndexServer's temporary directory /tmp/indexservertmp is not 777 after running sometime.

2020-10-23 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-3999.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> The permission of IndexServer's temporary directory /tmp/indexservertmp is 
> not 777 after running sometime.
> --
>
> Key: CARBONDATA-3999
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3999
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 2.0.0
>Reporter: renhao
>Priority: Critical
>  Labels: IndexServer
> Fix For: 2.1.0
>
> Attachments: 4700942c-3158-424f-8861-3dfcb6fae205.png
>
>
> 1.start index server in FI.check the permission of "/tmp/indexservertmp" in 
> hdfs is 777;
> 2.run sometime,an error occured when using indexserver,and check the 
> permission of "/tmp/indexservertmp" became 755



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3994) Skip Order by for map task if it is sort column and use limit pushdown for array_contains filter

2020-10-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-3994.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> Skip Order by for map task if it is sort column and use limit pushdown for 
> array_contains filter
> 
>
> Key: CARBONDATA-3994
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3994
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Ajantha Bhat
>Assignee: Ajantha Bhat
>Priority: Major
> Fix For: 2.1.0
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> When the order by column is in sort column, every map task output will be 
> already sorted. No need to sort the data again.
> Hence skipping the order at map task by changing plan node from 
> {{TakeOrderedAndProject}} --> {{CarbonTakeOrderedAndProjectExec}}
> Also in this scenario collecting the limit at map task and Array_contains() 
> will use this limit value for row scan filtering to break scan once limit 
> value is reached.
> Also added a carbon property to control this .
> {{carbon.mapOrderPushDown._.column}}
> Note: later we can improve this for other filters also to use the limit value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3937) Insert into select from another carbon /parquet table is not working on Hive Beeline on a newly create Hive write format - carbon table. We are getting “Database is

2020-10-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-3937.
--
Resolution: Invalid

Not able to reproduce in master code, please recheck

> Insert into select from another carbon /parquet table is not working on Hive 
> Beeline on a newly create Hive write format - carbon table. We are getting 
> “Database is not set" error.
> 
>
> Key: CARBONDATA-3937
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3937
> Project: CarbonData
>  Issue Type: Bug
>  Components: hive-integration
>Affects Versions: 2.0.0
>Reporter: Prasanna Ravichandran
>Priority: Major
>
> Insert into select from another carbon or parquet table to a carbon table is 
> not working on Hive Beeline on a newly create Hive write format carbon table. 
> We are getting “Database is not set” error.
>  
> Test queries:
>  drop table if exists hive_carbon;
> create table hive_carbon(id int, name string, scale decimal, country string, 
> salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler';
> insert into hive_carbon select 1,"Ram","2.3","India",3500;
> insert into hive_carbon select 2,"Raju","2.4","Russia",3600;
> insert into hive_carbon select 3,"Raghu","2.5","China",3700;
> insert into hive_carbon select 4,"Ravi","2.6","Australia",3800;
>  
> drop table if exists hive_carbon2;
> create table hive_carbon2(id int, name string, scale decimal, country string, 
> salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler';
> insert into hive_carbon2 select * from hive_carbon;
> select * from hive_carbon;
> select * from hive_carbon2;
>  
>  --execute below queries in spark-beeline;
> create table hive_table(id int, name string, scale decimal, country string, 
> salary double);
>  create table parquet_table(id int, name string, scale decimal, country 
> string, salary double) stored as parquet;
>  insert into hive_table select 1,"Ram","2.3","India",3500;
>  select * from hive_table;
>  insert into parquet_table select 1,"Ram","2.3","India",3500;
>  select * from parquet_table;
> --execute the below query in hive beeline;
> insert into hive_carbon select * from parquet_table;
> Attached the logs for your reference. But the insert into select from the 
> parquet and hive table into carbon table is working fine.
>  
> Only insert into select from hive table to carbon table is only working.
> Error details in MR job which run through hive query:
> Error: java.io.IOException: java.io.IOException: Database name is not set. at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:414)
>  at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:843)
>  at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:175) 
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444) at 
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at 
> org.apache.hadoop.mapred.YarnChild$1.run(YarnChild.java:175) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1737)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) Caused by: 
> java.io.IOException: Database name is not set. at 
> org.apache.carbondata.hadoop.api.CarbonInputFormat.getDatabaseName(CarbonInputFormat.java:841)
>  at 
> org.apache.carbondata.hive.MapredCarbonInputFormat.getCarbonTable(MapredCarbonInputFormat.java:80)
>  at 
> org.apache.carbondata.hive.MapredCarbonInputFormat.getQueryModel(MapredCarbonInputFormat.java:215)
>  at 
> org.apache.carbondata.hive.MapredCarbonInputFormat.getRecordReader(MapredCarbonInputFormat.java:205)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:411)
>  ... 9 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4036) When the ` character is present in column name, the table creation fails

2020-10-20 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4036.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> When the ` character is present in column name, the table creation fails
> 
>
> Key: CARBONDATA-4036
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4036
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Akash R Nilugal
>Assignee: Akash R Nilugal
>Priority: Minor
> Fix For: 2.1.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When the ` character is present in column name, the table creation fails
> sql("create table special_char(`i#d` string, `nam(e` 

[jira] [Resolved] (CARBONDATA-4004) Wrong result in Presto select query after executing update

2020-10-20 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4004.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> Wrong result in Presto select query after executing update
> --
>
> Key: CARBONDATA-4004
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4004
> Project: CarbonData
>  Issue Type: Bug
>  Components: core, presto-integration
>Reporter: Akshay
>Priority: Major
> Fix For: 2.1.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Presto select query after update operation returns different number of rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4012) Documentations issues.

2020-10-20 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4012.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> Documentations issues.
> --
>
> Key: CARBONDATA-4012
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4012
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Prasanna Ravichandran
>Priority: Minor
> Fix For: 2.1.0
>
>
> Support Array and Struct of all primitive type reading on presto from Spark 
> Carbon tables. This feature details have to be added in the below opensource 
> link:
> [https://github.com/apache/carbondata/blob/master/docs/prestosql-guide.md]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3975) Data mismatch when the binary data is read via hive in carbon.

2020-10-08 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-3975.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> Data mismatch when the binary data is read via hive in carbon.
> --
>
> Key: CARBONDATA-3975
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3975
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Akash R Nilugal
>Assignee: Akash R Nilugal
>Priority: Major
> Fix For: 2.1.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Data mismatch when the binary data is read via hive in carbon. carbon gives 
> some wrong data compared to hive table for the same input data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4019) CDC fails when the join expression contains the AND or any logical expression

2020-10-08 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4019.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> CDC fails when the join expression contains the AND or any logical expression
> -
>
> Key: CARBONDATA-4019
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4019
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Akash R Nilugal
>Assignee: Akash R Nilugal
>Priority: Major
> Fix For: 2.1.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> CDC fails when the join expression contains the AND or any logical expressions
> Fails with cast expression



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4016) NPE and FileNotFound in Show Segments and Insert Stage

2020-10-06 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4016.
--
Resolution: Fixed

> NPE and FileNotFound in Show Segments and Insert Stage
> --
>
> Key: CARBONDATA-4016
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4016
> Project: CarbonData
>  Issue Type: Bug
>  Components: flink-integration, spark-integration
>Affects Versions: 2.0.1
>Reporter: Xingjun Hao
>Priority: Minor
> Fix For: 2.1.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> # Insert Stage,  While Spark read Stages which are writting by Flink in the 
> meanwhile, JSONFORMAT EXCEPTION will be thrown.
>  # Show Segments with STAGE, when read stages which are writting by Flink or 
> deleting by spark. JSONFORMAT EXCEPTION will be thrown
>  # Show Segment will load partition info for non-partition table, which shall 
> be avoided.
>  # In getLastModifiedTime of TableStatus, if the loadendtime is empty, 
> getLastModifiedTime throw NPE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4014) Support Change Column Comment

2020-09-29 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4014.
--
Resolution: Fixed

> Support Change Column Comment
> -
>
> Key: CARBONDATA-4014
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4014
> Project: CarbonData
>  Issue Type: New Feature
>  Components: sql
>Affects Versions: 2.0.1
>Reporter: Xingjun Hao
>Priority: Minor
> Fix For: 2.1.0
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Now, we support add comment when CREATE TABLE and ADD COLUMN. but do not 
> support alter comment of specified column. 
> We shall support alter comment with hive syntax
> "ALTER TABLE table_name CHANGE [COLUMN] col_name col_name data_type [COMMENT 
> col_comment]"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3997) Issue in decimal value reading for negative numbers from presto

2020-09-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-3997.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> Issue in decimal value reading for negative numbers from presto
> ---
>
> Key: CARBONDATA-3997
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3997
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Ajantha Bhat
>Assignee: Ajantha Bhat
>Priority: Major
> Fix For: 2.1.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When complex decimal column is stored with DIRECT_COMPRESS codec, 
> DataTypeUtil#bigDecimalToByte is used to create a byte array.
> So, while decoding it, need to use DataTypeUtil#byteToBigDecimal to get back 
> the proper value



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4001) SI global sort load on partition table results in 0 rows

2020-09-22 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-4001.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> SI global sort load on partition table results in 0 rows
> 
>
> Key: CARBONDATA-4001
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4001
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Ajantha Bhat
>Assignee: Ajantha Bhat
>Priority: Major
> Fix For: 2.1.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> On a partition table, when SI is created with global sort and when data is 
> loaded. It shows 0 rows as main table query results is 0 rows.
> For partition table, local sort SI flow {{current.segmentfile}} is set in 
> {{CarbonSecondaryIndexRDD}}
> For the global sort, this value was not set. so, the main table query was 
> resulting in 0 rows. Setting this value for global sort flow also.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3998) FileNotFoundException being thrown in hive during insert.

2020-09-21 Thread Kunal Kapoor (Jira)
Kunal Kapoor created CARBONDATA-3998:


 Summary:  FileNotFoundException being thrown in hive during insert.
 Key: CARBONDATA-3998
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3998
 Project: CarbonData
  Issue Type: Bug
Reporter: Kunal Kapoor
Assignee: Kunal Kapoor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3995) Support presto querying older complex type stores

2020-09-18 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-3995.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> Support presto querying older complex type stores
> -
>
> Key: CARBONDATA-3995
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3995
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Ajantha Bhat
>Assignee: Ajantha Bhat
>Priority: Major
> Fix For: 2.1.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Before carbon 2.0, complex child length is stored as SHORT for string, 
> varchar, binary, date, decimal types.
> So, In 2.0 as it is stored as INT, presto complex query code always assumes 
> it as INT 
> and goes to out of bound exception when old store is queried.
>  
> If INT_LENGTH_COMPLEX_CHILD_BYTE_ARRAY encoding is present, parse as INT, 
> else parse as SHORT.
> so, that both stores can be queried.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3982) Use Partition instead of Span to split legacy and non-legacy segments for executor distribution in indexserver

2020-09-13 Thread Kunal Kapoor (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kunal Kapoor resolved CARBONDATA-3982.
--
Fix Version/s: 2.1.0
   Resolution: Fixed

> Use Partition instead of Span to split legacy and non-legacy segments for 
> executor distribution in indexserver 
> ---
>
> Key: CARBONDATA-3982
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3982
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
> Fix For: 2.1.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   >