[jira] [Created] (CARBONDATA-4300) Clean files command supports specify segment ids

2021-10-08 Thread Yahui Liu (Jira)
Yahui Liu created CARBONDATA-4300:
-

 Summary: Clean files command supports specify segment ids
 Key: CARBONDATA-4300
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4300
 Project: CarbonData
  Issue Type: New Feature
  Components: sql
Reporter: Yahui Liu


Clean files command supports specify segment ids, syntax is "clean files for 
table table_name options("segment_ids"="id1,id2,id3...")". If specified segment 
ids, then only the segment with these ids will be delete physically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4291) Carbon hive table supports float datatype

2021-10-08 Thread Yahui Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yahui Liu updated CARBONDATA-4291:
--
Affects Version/s: (was: 2.1.1)
   2.2.0

> Carbon hive table supports float datatype
> -
>
> Key: CARBONDATA-4291
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4291
> Project: CarbonData
>  Issue Type: New Feature
>  Components: sql
>Affects Versions: 2.2.0
>Reporter: Yahui Liu
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently when create carbon hive table, if data type is float, will convert 
> to double type. This means all float data will be stored as double.
> In CTAS secnario, if source table column is float type, the data in newly 
> created carbon table will be incorrect.
> Reproduce steps:
> CREATE TABLE p1(f float) stored as parquet;
> insert into table p1 select 12.36;
> create table carbon1 stored as carbondata as select * from p1;
> select * from carbon1;
> Result:
> 5.410467587E-315
> Carbon should support store float datatype directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-4215) When carbon.enable.vector.reader=false and upon adding a parquet segment through alter add segments in a carbon table , we are getting error in count(*)

2021-10-08 Thread Indhumathi (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi resolved CARBONDATA-4215.

Fix Version/s: 2.3.0
   Resolution: Fixed

> When carbon.enable.vector.reader=false and upon adding a parquet segment 
> through alter add segments in a carbon table , we are getting error in 
> count(*)
> 
>
> Key: CARBONDATA-4215
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4215
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.1.1
> Environment: 3 node FI
>Reporter: Prasanna Ravichandran
>Priority: Minor
> Fix For: 2.3.0
>
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> When carbon.enable.vector.reader=false and upon adding a parquet segment 
> through alter add segments in a carbon table , we are getting error in 
> count(*).
>  
> Test queries:
> --set carbon.enable.vector.reader=false in carbon.properties;
> use default;
> drop table if exists uniqdata;
> CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as carbondata;
> load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> table uniqdata 
> options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> drop table if exists uniqdata_parquet;
> CREATE TABLE uniqdata_parquet (cust_id int,cust_name 
> String,active_emui_version string, dob timestamp, doj timestamp, 
> bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), 
> decimal_column2 decimal(36,36),double_column1 double, double_column2 
> double,integer_column1 int) stored as parquet;
> insert into uniqdata_parquet select * from uniqdata;
> create database if not exists test;
> use test;
> CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as carbondata;
> load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> table uniqdata 
> options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> Alter table uniqdata add segment options 
> ('path'='hdfs://hacluster/user/hive/warehouse/uniqdata_parquet','format'='parquet');
>  select count(*) from uniqdata; -- throwing error class cast exception;
>  
> Error Log traces:
> java.lang.ClassCastException: org.apache.spark.sql.vectorized.ColumnarBatch 
> cannot be cast to org.apache.spark.sql.catalyst.InternalRow
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithoutKey_0$(Unknown
>  Source)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>  at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:584)
>  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>  at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:132)
>  at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:58)
>  at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>  at org.apache.spark.scheduler.Task.run(Task.scala:123)
>  at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:413)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1551)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:419)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2021-06-19 13:50:59,035 | WARN | task-result-getter-2 | Lost task 0.0 in 
> stage 4.0 (TID 28, localhost, executor driver): java.lang.ClassCastException: 
> 

[jira] [Updated] (CARBONDATA-4297) Create table(Carbon and Parquet) with combination of partitioned by, Clustered by, Sorted by and with options parameter and insert overwrite fails with parser errors

2021-10-08 Thread Chetan Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Bhat updated CARBONDATA-4297:

Description: 
*Issue 1 : Create table* *(Carbon and Parquet) with combination of partitioned 
by, Clustered by, Sorted by fails -*

*Queries-*

CREATE TABLE t (a STRING, b INT, c STRING, d STRING) stored as carbondata
 OPTIONS (a '1', b '2')
 PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
 COMMENT 'table_comment'
 TBLPROPERTIES (t 'test');
 CREATE TABLE t (a STRING, b INT, c STRING, d STRING) stored as parquet
 OPTIONS (a '1', b '2')
 PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
 COMMENT 'table_comment'
 TBLPROPERTIES (t 'test');

0: jdbc:hive2://7.187.185.158:23040/default> CREATE TABLE t (a STRING, b INT, c 
STRING, d STRING) stored as carbondata
 0: jdbc:hive2://7.187.185.158:23040/default> OPTIONS (a '1', b '2')
 0: jdbc:hive2://7.187.185.158:23040/default> PARTITIONED BY (c, d) CLUSTERED 
BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
 0: jdbc:hive2://7.187.185.158:23040/default> COMMENT 'table_comment'
 0: jdbc:hive2://7.187.185.158:23040/default> TBLPROPERTIES (t 'test');
 Error: org.apache.spark.sql.AnalysisException: == Spark Parser: 
org.apache.spark.sql.execution.SparkSqlParser ==

mismatched input 'OPTIONS' expecting (line 2, pos 0)

== SQL ==
 CREATE TABLE t (a STRING, b INT, c STRING, d STRING) stored as carbondata
 OPTIONS (a '1', b '2')
 ^^^
 PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
 COMMENT 'table_comment'
 TBLPROPERTIES (t 'test')

== Carbon Parser: org.apache.spark.sql.parser.CarbonExtensionSpark2SqlParser ==
 [1.8] failure: identifier matching regex (?i)MATERIALIZED expected

CREATE TABLE t (a STRING, b INT, c STRING, d STRING) stored as carbondata
 ^;
 == Antlr Parser: org.apache.spark.sql.parser.CarbonAntlrParser ==
 Antlr SQL Parser will only deal with Merge Into SQL Command; (state=,code=0)
 0: jdbc:hive2://7.187.185.158:23040/default> CREATE TABLE t (a STRING, b INT, 
c STRING, d STRING) stored as parquet
 0: jdbc:hive2://7.187.185.158:23040/default> OPTIONS (a '1', b '2')
 0: jdbc:hive2://7.187.185.158:23040/default> PARTITIONED BY (c, d) CLUSTERED 
BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
 0: jdbc:hive2://7.187.185.158:23040/default> COMMENT 'table_comment'
 0: jdbc:hive2://7.187.185.158:23040/default> TBLPROPERTIES (t 'test');
 Error: org.apache.spark.sql.AnalysisException: == Spark Parser: 
org.apache.spark.sql.execution.SparkSqlParser ==

mismatched input 'OPTIONS' expecting (line 2, pos 0)

== SQL ==
 CREATE TABLE t (a STRING, b INT, c STRING, d STRING) stored as parquet
 OPTIONS (a '1', b '2')
 ^^^
 PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
 COMMENT 'table_comment'
 TBLPROPERTIES (t 'test')

== Carbon Parser: org.apache.spark.sql.parser.CarbonExtensionSpark2SqlParser ==
 [1.8] failure: identifier matching regex (?i)MATERIALIZED expected

CREATE TABLE t (a STRING, b INT, c STRING, d STRING) stored as parquet
 ^;
 == Antlr Parser: org.apache.spark.sql.parser.CarbonAntlrParser ==
 Antlr SQL Parser will only deal with Merge Into SQL Command; (state=,code=0)
 0: jdbc:hive2://7.187.185.158:23040/default>

 

*Issue 2 : Create table with options parameter fails-*

*Queries-*

CREATE TABLE tbl (a INT, b STRING, c INT) stored as carbondata OPTIONS ('a' 1);
 CREATE TABLE tbl1 (a INT, b STRING, c INT) stored as parquet OPTIONS ('a' 1);

 

0: jdbc:hive2://7.187.185.158:23040/default> CREATE TABLE tbl (a INT, b STRING, 
c INT) stored as carbondata OPTIONS ('a' 1);
 Error: org.apache.spark.sql.AnalysisException: == Spark Parser: 
org.apache.spark.sql.execution.SparkSqlParser ==

mismatched input 'OPTIONS' expecting (line 1, pos 63)

== SQL ==
 CREATE TABLE tbl (a INT, b STRING, c INT) stored as carbondata OPTIONS ('a' 1)
 ---^^^

== Carbon Parser: org.apache.spark.sql.parser.CarbonExtensionSpark2SqlParser ==
 [1.8] failure: identifier matching regex (?i)MATERIALIZED expected

CREATE TABLE tbl (a INT, b STRING, c INT) stored as carbondata OPTIONS ('a' 1)
 ^;
 == Antlr Parser: org.apache.spark.sql.parser.CarbonAntlrParser ==
 Antlr SQL Parser will only deal with Merge Into SQL Command; (state=,code=0)
 0: jdbc:hive2://7.187.185.158:23040/default> CREATE TABLE tbl1 (a INT, b 
STRING, c INT) stored as parquet OPTIONS ('a' 1);
 Error: org.apache.spark.sql.AnalysisException: == Spark Parser: 
org.apache.spark.sql.execution.SparkSqlParser ==

mismatched input 'OPTIONS' expecting (line 1, pos 61)

== SQL ==
 CREATE TABLE tbl1 (a INT, b STRING, c INT) stored as parquet OPTIONS ('a' 1)
 -^^^

== Carbon Parser: org.apache.spark.sql.parser.CarbonExtensionSpark2SqlParser ==
 [1.8] failure: identifier matching regex (?i)MATERIALIZED expected

CREATE TABLE tbl1 (a INT, b 

[jira] [Updated] (CARBONDATA-4297) Create table(Carbon and Parquet) with combination of partitioned by, Clustered by, Sorted by and with options parameter and insert overwrite fails with parser errors

2021-10-08 Thread Chetan Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Bhat updated CARBONDATA-4297:

 Attachment: image-2021-10-08-12-51-14-837.png
Description: 
*Issue 1 : Create table* *(Carbon and Parquet) with combination of partitioned 
by, Clustered by, Sorted by fails -*

*Queries-*

CREATE TABLE t (a STRING, b INT, c STRING, d STRING) stored as carbondata
 OPTIONS (a '1', b '2')
 PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
 COMMENT 'table_comment'
 TBLPROPERTIES (t 'test');
 CREATE TABLE t (a STRING, b INT, c STRING, d STRING) stored as parquet
 OPTIONS (a '1', b '2')
 PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
 COMMENT 'table_comment'
 TBLPROPERTIES (t 'test');

0: jdbc:hive2://7.187.185.158:23040/default> CREATE TABLE t (a STRING, b INT, c 
STRING, d STRING) stored as carbondata
 0: jdbc:hive2://7.187.185.158:23040/default> OPTIONS (a '1', b '2')
 0: jdbc:hive2://7.187.185.158:23040/default> PARTITIONED BY (c, d) CLUSTERED 
BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
 0: jdbc:hive2://7.187.185.158:23040/default> COMMENT 'table_comment'
 0: jdbc:hive2://7.187.185.158:23040/default> TBLPROPERTIES (t 'test');
 Error: org.apache.spark.sql.AnalysisException: == Spark Parser: 
org.apache.spark.sql.execution.SparkSqlParser ==

mismatched input 'OPTIONS' expecting (line 2, pos 0)

== SQL ==
 CREATE TABLE t (a STRING, b INT, c STRING, d STRING) stored as carbondata
 OPTIONS (a '1', b '2')
 ^^^
 PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
 COMMENT 'table_comment'
 TBLPROPERTIES (t 'test')

== Carbon Parser: org.apache.spark.sql.parser.CarbonExtensionSpark2SqlParser ==
 [1.8] failure: identifier matching regex (?i)MATERIALIZED expected

CREATE TABLE t (a STRING, b INT, c STRING, d STRING) stored as carbondata
 ^;
 == Antlr Parser: org.apache.spark.sql.parser.CarbonAntlrParser ==
 Antlr SQL Parser will only deal with Merge Into SQL Command; (state=,code=0)
 0: jdbc:hive2://7.187.185.158:23040/default> CREATE TABLE t (a STRING, b INT, 
c STRING, d STRING) stored as parquet
 0: jdbc:hive2://7.187.185.158:23040/default> OPTIONS (a '1', b '2')
 0: jdbc:hive2://7.187.185.158:23040/default> PARTITIONED BY (c, d) CLUSTERED 
BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
 0: jdbc:hive2://7.187.185.158:23040/default> COMMENT 'table_comment'
 0: jdbc:hive2://7.187.185.158:23040/default> TBLPROPERTIES (t 'test');
 Error: org.apache.spark.sql.AnalysisException: == Spark Parser: 
org.apache.spark.sql.execution.SparkSqlParser ==

mismatched input 'OPTIONS' expecting (line 2, pos 0)

== SQL ==
 CREATE TABLE t (a STRING, b INT, c STRING, d STRING) stored as parquet
 OPTIONS (a '1', b '2')
 ^^^
 PARTITIONED BY (c, d) CLUSTERED BY (a) SORTED BY (b ASC) INTO 2 BUCKETS
 COMMENT 'table_comment'
 TBLPROPERTIES (t 'test')

== Carbon Parser: org.apache.spark.sql.parser.CarbonExtensionSpark2SqlParser ==
 [1.8] failure: identifier matching regex (?i)MATERIALIZED expected

CREATE TABLE t (a STRING, b INT, c STRING, d STRING) stored as parquet
 ^;
 == Antlr Parser: org.apache.spark.sql.parser.CarbonAntlrParser ==
 Antlr SQL Parser will only deal with Merge Into SQL Command; (state=,code=0)
 0: jdbc:hive2://7.187.185.158:23040/default>

 

*Issue 2 : Create table with options parameter fails-*

*Queries-*

CREATE TABLE tbl (a INT, b STRING, c INT) stored as carbondata OPTIONS ('a' 1);
 CREATE TABLE tbl1 (a INT, b STRING, c INT) stored as parquet OPTIONS ('a' 1);

 

0: jdbc:hive2://7.187.185.158:23040/default> CREATE TABLE tbl (a INT, b STRING, 
c INT) stored as carbondata OPTIONS ('a' 1);
 Error: org.apache.spark.sql.AnalysisException: == Spark Parser: 
org.apache.spark.sql.execution.SparkSqlParser ==

mismatched input 'OPTIONS' expecting (line 1, pos 63)

== SQL ==
 CREATE TABLE tbl (a INT, b STRING, c INT) stored as carbondata OPTIONS ('a' 1)
 ---^^^

== Carbon Parser: org.apache.spark.sql.parser.CarbonExtensionSpark2SqlParser ==
 [1.8] failure: identifier matching regex (?i)MATERIALIZED expected

CREATE TABLE tbl (a INT, b STRING, c INT) stored as carbondata OPTIONS ('a' 1)
 ^;
 == Antlr Parser: org.apache.spark.sql.parser.CarbonAntlrParser ==
 Antlr SQL Parser will only deal with Merge Into SQL Command; (state=,code=0)
 0: jdbc:hive2://7.187.185.158:23040/default> CREATE TABLE tbl1 (a INT, b 
STRING, c INT) stored as parquet OPTIONS ('a' 1);
 Error: org.apache.spark.sql.AnalysisException: == Spark Parser: 
org.apache.spark.sql.execution.SparkSqlParser ==

mismatched input 'OPTIONS' expecting (line 1, pos 61)

== SQL ==
 CREATE TABLE tbl1 (a INT, b STRING, c INT) stored as parquet OPTIONS ('a' 1)
 -^^^

== Carbon Parser: org.apache.spark.sql.parser.CarbonExtensionSpark2SqlParser ==
 [1.8] failure: identifier matching regex