[jira] [Closed] (CARBONDATA-4021) With Index server running, Upon executing count* we are getting the below error, after adding the parquet and ORC segment.

2020-11-23 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran closed CARBONDATA-4021.
-
Resolution: Not A Problem

> With Index server running, Upon executing count* we are getting the below 
> error, after adding the parquet and ORC segment. 
> ---
>
> Key: CARBONDATA-4021
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4021
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Prasanna Ravichandran
>Priority: Major
>
> We are getting below issues while index server enable and index server 
> fallback disable is configured as true. With count* we are getting the below 
> error, after adding the parquet and ORC segment.
> Queries and error:
> > use rps;
> +-+|
> Result  |
> +-+
> +-+
> No rows selected (0.054 seconds)
> > drop table if exists uniqdata;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.229 seconds)
> > CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> > bigint,decimal_column1 decimal(30,10), decimal_column2 
> > decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> > int) stored as carbondata;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.756 seconds)
> > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> > table uniqdata 
> > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> INFO  : Execution ID: 95
> +-+|
> Result  |
> +-+
> +-+
> No rows selected(2.789 seconds)
>  > use default;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.052 seconds)
>  > drop table if exists uniqdata;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (1.122 seconds)
> > CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> > bigint,decimal_column1 decimal(30,10), decimal_column2 
> > decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> > int) stored as carbondata;
> +-+
> |  Result  |
> +-+
> +-+
> No rows selected (0.508 seconds)
> > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> > table uniqdata 
> > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> INFO  : Execution ID: 108
> +-+
> |Result|
> +-+
> +-+
> No rows selected (1.316 seconds)
> > drop table if exists uniqdata_parquet;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.668 seconds)
> > CREATE TABLE uniqdata_parquet (cust_id int,cust_name 
> > String,active_emui_version string, dob timestamp, doj timestamp, 
> > bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), 
> > decimal_column2 decimal(36,36),double_column1 double, double_column2 
> > double,integer_column1 int) stored as parquet;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.397 seconds)
> > insert into uniqdata_parquet select * from uniqdata;
> INFO  : Execution ID: 116
> +-+
> |Result|
> +-+
> +-+
> No rows selected (4.805 seconds)
> >  drop table if exists uniqdata_orc;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.553 seconds)
> > CREATE TABLE uniqdata_orc (cust_id int,cust_name String,active_emui_version 
> > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> > bigint,decimal_column1 decimal(30,10), decimal_column2 
> > decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> > int) using orc;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.396 seconds)
> > insert into uniqdata_orc select * from uniqdata;
> INFO  : Execution ID: 122
> +-+
> |Result|
> +-+
> +-+
> No rows selected (3.403 seconds)
> > use rps;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.06 seconds)
> > Alter table uniqdata add segment options 
> > ('path'='hdfs://hacluster/user/hive/warehouse/uniqdata_parquet','format'='parquet');
> INFO  : Execution ID: 126
> +-+
> |Result|
> +-+
> +-+
> No rows selected (1.511 seconds)
> > Alter table uniqdata add segment options 

[jira] [Comment Edited] (CARBONDATA-4029) After delete in the table which has Alter-added SDK segments, then the count(*) is 0.

2020-11-06 Thread Prasanna Ravichandran (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227384#comment-17227384
 ] 

Prasanna Ravichandran edited comment on CARBONDATA-4029 at 11/6/20, 12:47 PM:
--

Working fine and passed with the newly created SDK segments. Some old SDK 
segments with future timestamp will be considered  invalid segment because of 
some other scenarios in delete/update. Old SDK files with future timestamp 
values, cannot be fixed. 


was (Author: prasanna ravichandran):
Working fine and passed with the newly created SDK segments. Some old SDK 
segments with future timestamp will be considered as consider invalid segment 
because of some other scenarios in delete/update. Old SDK files with future 
timestamp values, cannot be fixed. 

> After delete in the table which has Alter-added SDK segments, then the 
> count(*) is 0.
> -
>
> Key: CARBONDATA-4029
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4029
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: 3 node FI cluster
>Reporter: Prasanna Ravichandran
>Priority: Minor
> Attachments: Primitive.rar
>
>
> Do delete on a table which has alter added SDK segments. then the count* is 
> 0. Even count* will be 0 even any number of SDK segments are added after it.
> Test queries:
> drop table if exists external_primitive;
> create table external_primitive (id int, name string, rank smallint, salary 
> double, active boolean, dob date, doj timestamp, city string, dept string) 
> stored as carbondata;
> --before executing the below alter add segment-place the attached SDK files 
> in hdfs at /sdkfiles/primitive2 folder;
> alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');select
>  * from external_primitive;
> delete from external_primitive where id =2;select * from external_primitive;
> Console output:
> /> drop table if exists external_primitive;
> +-+
> | Result |
> +-+
> +-+
> No rows selected (1.586 seconds)
> /> create table external_primitive (id int, name string, rank smallint, 
> salary double, active boolean, dob date, doj timestamp, city string, dept 
> string) stored as carbondata;
> +-+
> | Result |
> +-+
> +-+
> No rows selected (0.774 seconds)
> /> alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');select
>  * from external_primitive;
> +-+
> | Result |
> +-+
> +-+
> No rows selected (1.077 seconds)
> INFO : Execution ID: 320
> +-+---+---+--+-+-++++
> | id | name | rank | salary | active | dob | doj | city | dept |
> +-+---+---+--+-+-++++
> | 1 | AAA | 3 | 3444345.66 | true | 1979-12-09 | 2011-02-10 01:00:20.0 | Pune 
> | IT |
> | 2 | BBB | 2 | 543124.66 | false | 1987-02-19 | 2017-01-01 12:00:20.0 | 
> Bangalore | DATA |
> | 3 | CCC | 1 | 787878.888 | false | 1982-05-12 | 2015-12-01 02:20:20.0 | 
> Pune | DATA |
> | 4 | DDD | 1 | 9.24 | true | 1981-04-09 | 2000-01-15 07:00:20.0 | Delhi 
> | MAINS |
> | 5 | EEE | 3 | 545656.99 | true | 1987-12-09 | 2017-11-25 04:00:20.0 | Delhi 
> | IT |
> | 6 | FFF | 2 | 768678.0 | false | 1987-12-20 | 2017-01-10 05:00:20.0 | 
> Bangalore | DATA |
> | 7 | GGG | 3 | 765665.0 | true | 1983-06-12 | 2017-01-01 02:00:20.0 | Pune | 
> IT |
> | 8 | HHH | 2 | 567567.66 | false | 1979-01-12 | 1995-01-01 12:00:20.0 | 
> Bangalore | DATA |
> | 9 | III | 2 | 787878.767 | true | 1985-02-19 | 2005-08-15 01:00:20.0 | Pune 
> | DATA |
> | 10 | JJJ | 3 | 887877.14 | true | 2000-05-19 | 2016-10-10 12:00:20.0 | 
> Bangalore | MAINS |
> | 18 | | 3 | 7.86786786787E9 | true | 1980-10-05 | 1995-10-07 22:00:20.0 | 
> Bangalore | IT |
> | 19 | | 2 | 5464545.33 | true | 1986-06-06 | 2008-08-15 01:00:20.0 | Delhi | 
> DATA |
> | 20 | NULL | 3 | 7867867.34 | true | 2000-05-01 | 2014-01-18 12:00:20.0 | 
> Bangalore | MAINS |
> +-+---+---+--+-+-++++
> 13 rows selected (2.458 seconds)
> /> delete from external_primitive where id =2;select * from 
> external_primitive;
> INFO : Execution ID: 322
> ++
> | Deleted Row Count |
> ++
> | 1 |
> ++
> 1 row selected (3.723 seconds)
> +-+---+---+-+-+--+--+---+---+
> | id | name | rank | salary | active | dob | doj | city | dept |
> 

[jira] [Commented] (CARBONDATA-4029) After delete in the table which has Alter-added SDK segments, then the count(*) is 0.

2020-11-06 Thread Prasanna Ravichandran (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227384#comment-17227384
 ] 

Prasanna Ravichandran commented on CARBONDATA-4029:
---

Working fine and passed with the newly created SDK segments. Some old SDK 
segments with future timestamp will be considered as consider invalid segment 
because of some other scenarios in delete/update. Old SDK files with future 
timestamp values, cannot be fixed. 

> After delete in the table which has Alter-added SDK segments, then the 
> count(*) is 0.
> -
>
> Key: CARBONDATA-4029
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4029
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: 3 node FI cluster
>Reporter: Prasanna Ravichandran
>Priority: Minor
> Attachments: Primitive.rar
>
>
> Do delete on a table which has alter added SDK segments. then the count* is 
> 0. Even count* will be 0 even any number of SDK segments are added after it.
> Test queries:
> drop table if exists external_primitive;
> create table external_primitive (id int, name string, rank smallint, salary 
> double, active boolean, dob date, doj timestamp, city string, dept string) 
> stored as carbondata;
> --before executing the below alter add segment-place the attached SDK files 
> in hdfs at /sdkfiles/primitive2 folder;
> alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');select
>  * from external_primitive;
> delete from external_primitive where id =2;select * from external_primitive;
> Console output:
> /> drop table if exists external_primitive;
> +-+
> | Result |
> +-+
> +-+
> No rows selected (1.586 seconds)
> /> create table external_primitive (id int, name string, rank smallint, 
> salary double, active boolean, dob date, doj timestamp, city string, dept 
> string) stored as carbondata;
> +-+
> | Result |
> +-+
> +-+
> No rows selected (0.774 seconds)
> /> alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');select
>  * from external_primitive;
> +-+
> | Result |
> +-+
> +-+
> No rows selected (1.077 seconds)
> INFO : Execution ID: 320
> +-+---+---+--+-+-++++
> | id | name | rank | salary | active | dob | doj | city | dept |
> +-+---+---+--+-+-++++
> | 1 | AAA | 3 | 3444345.66 | true | 1979-12-09 | 2011-02-10 01:00:20.0 | Pune 
> | IT |
> | 2 | BBB | 2 | 543124.66 | false | 1987-02-19 | 2017-01-01 12:00:20.0 | 
> Bangalore | DATA |
> | 3 | CCC | 1 | 787878.888 | false | 1982-05-12 | 2015-12-01 02:20:20.0 | 
> Pune | DATA |
> | 4 | DDD | 1 | 9.24 | true | 1981-04-09 | 2000-01-15 07:00:20.0 | Delhi 
> | MAINS |
> | 5 | EEE | 3 | 545656.99 | true | 1987-12-09 | 2017-11-25 04:00:20.0 | Delhi 
> | IT |
> | 6 | FFF | 2 | 768678.0 | false | 1987-12-20 | 2017-01-10 05:00:20.0 | 
> Bangalore | DATA |
> | 7 | GGG | 3 | 765665.0 | true | 1983-06-12 | 2017-01-01 02:00:20.0 | Pune | 
> IT |
> | 8 | HHH | 2 | 567567.66 | false | 1979-01-12 | 1995-01-01 12:00:20.0 | 
> Bangalore | DATA |
> | 9 | III | 2 | 787878.767 | true | 1985-02-19 | 2005-08-15 01:00:20.0 | Pune 
> | DATA |
> | 10 | JJJ | 3 | 887877.14 | true | 2000-05-19 | 2016-10-10 12:00:20.0 | 
> Bangalore | MAINS |
> | 18 | | 3 | 7.86786786787E9 | true | 1980-10-05 | 1995-10-07 22:00:20.0 | 
> Bangalore | IT |
> | 19 | | 2 | 5464545.33 | true | 1986-06-06 | 2008-08-15 01:00:20.0 | Delhi | 
> DATA |
> | 20 | NULL | 3 | 7867867.34 | true | 2000-05-01 | 2014-01-18 12:00:20.0 | 
> Bangalore | MAINS |
> +-+---+---+--+-+-++++
> 13 rows selected (2.458 seconds)
> /> delete from external_primitive where id =2;select * from 
> external_primitive;
> INFO : Execution ID: 322
> ++
> | Deleted Row Count |
> ++
> | 1 |
> ++
> 1 row selected (3.723 seconds)
> +-+---+---+-+-+--+--+---+---+
> | id | name | rank | salary | active | dob | doj | city | dept |
> +-+---+---+-+-+--+--+---+---+
> +-+---+---+-+-+--+--+---+---+
> No rows selected (1.531 seconds)
> /> alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon');select
>  * from external_primitive;
> +-+
> | Result |
> +-+
> 

[jira] [Updated] (CARBONDATA-4029) After delete in the table which has Alter-added SDK segments, then the count(*) is 0.

2020-10-30 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-4029:
--
Description: 
Do delete on a table which has alter added SDK segments. then the count* is 0. 
Even count* will be 0 even any number of SDK segments are added after it.

Test queries:

drop table if exists external_primitive;
create table external_primitive (id int, name string, rank smallint, salary 
double, active boolean, dob date, doj timestamp, city string, dept string) 
stored as carbondata;

--before executing the below alter add segment-place the attached SDK files in 
hdfs at /sdkfiles/primitive2 folder;

alter table external_primitive add segment 
options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');select 
* from external_primitive;
delete from external_primitive where id =2;select * from external_primitive;

Console output:

/> drop table if exists external_primitive;
+-+
| Result |
+-+
+-+
No rows selected (1.586 seconds)
/> create table external_primitive (id int, name string, rank smallint, salary 
double, active boolean, dob date, doj timestamp, city string, dept string) 
stored as carbondata;
+-+
| Result |
+-+
+-+
No rows selected (0.774 seconds)

/> alter table external_primitive add segment 
options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');select 
* from external_primitive;
+-+
| Result |
+-+
+-+
No rows selected (1.077 seconds)
INFO : Execution ID: 320
+-+---+---+--+-+-++++
| id | name | rank | salary | active | dob | doj | city | dept |
+-+---+---+--+-+-++++
| 1 | AAA | 3 | 3444345.66 | true | 1979-12-09 | 2011-02-10 01:00:20.0 | Pune | 
IT |
| 2 | BBB | 2 | 543124.66 | false | 1987-02-19 | 2017-01-01 12:00:20.0 | 
Bangalore | DATA |
| 3 | CCC | 1 | 787878.888 | false | 1982-05-12 | 2015-12-01 02:20:20.0 | Pune 
| DATA |
| 4 | DDD | 1 | 9.24 | true | 1981-04-09 | 2000-01-15 07:00:20.0 | Delhi | 
MAINS |
| 5 | EEE | 3 | 545656.99 | true | 1987-12-09 | 2017-11-25 04:00:20.0 | Delhi | 
IT |
| 6 | FFF | 2 | 768678.0 | false | 1987-12-20 | 2017-01-10 05:00:20.0 | 
Bangalore | DATA |
| 7 | GGG | 3 | 765665.0 | true | 1983-06-12 | 2017-01-01 02:00:20.0 | Pune | 
IT |
| 8 | HHH | 2 | 567567.66 | false | 1979-01-12 | 1995-01-01 12:00:20.0 | 
Bangalore | DATA |
| 9 | III | 2 | 787878.767 | true | 1985-02-19 | 2005-08-15 01:00:20.0 | Pune | 
DATA |
| 10 | JJJ | 3 | 887877.14 | true | 2000-05-19 | 2016-10-10 12:00:20.0 | 
Bangalore | MAINS |
| 18 | | 3 | 7.86786786787E9 | true | 1980-10-05 | 1995-10-07 22:00:20.0 | 
Bangalore | IT |
| 19 | | 2 | 5464545.33 | true | 1986-06-06 | 2008-08-15 01:00:20.0 | Delhi | 
DATA |
| 20 | NULL | 3 | 7867867.34 | true | 2000-05-01 | 2014-01-18 12:00:20.0 | 
Bangalore | MAINS |
+-+---+---+--+-+-++++
13 rows selected (2.458 seconds)
/> delete from external_primitive where id =2;select * from external_primitive;
INFO : Execution ID: 322
++
| Deleted Row Count |
++
| 1 |
++
1 row selected (3.723 seconds)
+-+---+---+-+-+--+--+---+---+
| id | name | rank | salary | active | dob | doj | city | dept |
+-+---+---+-+-+--+--+---+---+
+-+---+---+-+-+--+--+---+---+
No rows selected (1.531 seconds)
/> alter table external_primitive add segment 
options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon');select 
* from external_primitive;
+-+
| Result |
+-+
+-+
No rows selected (0.766 seconds)
+-+---+---+-+-+--+--+---+---+
| id | name | rank | salary | active | dob | doj | city | dept |
+-+---+---+-+-+--+--+---+---+
+-+---+---+-+-+--+--+---+---+
No rows selected (1.439 seconds)
/> select count(*) from external_primitive;
INFO : Execution ID: 335
+---+
| count(1) |
+---+
| 0 |
+---+
1 row selected (1.278 seconds)
/>

> After delete in the table which has Alter-added SDK segments, then the 
> count(*) is 0.
> -
>
> Key: CARBONDATA-4029
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4029
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: 3 node FI cluster
>Reporter: Prasanna Ravichandran
>   

[jira] [Updated] (CARBONDATA-4029) After delete in the table which has Alter-added SDK segments, then the count(*) is 0.

2020-10-30 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-4029:
--
Description: (was: We are getting Number format exception while 
querying on the date columns. Attached the SDK files also.

Test queries:

--SDK compaction;
 drop table if exists external_primitive;
 create table external_primitive (id int, name string, rank smallint, salary 
double, active boolean, dob date, doj timestamp, city string, dept string) 
stored as carbondata;
 alter table external_primitive add segment 
options('path'='hdfs://hacluster/sdkfiles/primitive','format'='carbon');
 alter table external_primitive add segment 
options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');
 alter table external_primitive add segment 
options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon');
 alter table external_primitive add segment 
options('path'='hdfs://hacluster/sdkfiles/primitive4','format'='carbon');
 alter table external_primitive add segment 
options('path'='hdfs://hacluster/sdkfiles/primitive5','format'='carbon');
 
 alter table external_primitive compact 'minor'; --working fine pass;
 select count(*) from external_primitive;--working fine pass;

show segments for table external_primitive;
 select * from external_primitive limit 13; --working fine pass;
 select * from external_primitive limit 14; --failed getting number format 
exception;
select min(dob) from external_primitive; --failed getting number format 
exception;
select max(dob) from external_primitive; --working;
select dob from external_primitive; --failed getting number format exception;

Console:

*0: /> show segments for table external_primitive;*
+--++--+--+++-+--+
| ID | Status | Load Start Time | Load Time Taken | Partition | Data Size | 
Index Size | File Format |
+--++--+--+++-+--+
| 4 | Success | 2020-10-13 11:52:04.012 | 0.511S | {} | 1.88KB | 655.0B | 
columnar_v3 |
| 3 | Compacted | 2020-10-13 11:52:00.587 | 0.828S | {} | 1.88KB | 655.0B | 
columnar_v3 |
| 2 | Compacted | 2020-10-13 11:51:57.767 | 0.775S | {} | 1.88KB | 655.0B | 
columnar_v3 |
| 1 | Compacted | 2020-10-13 11:51:54.678 | 1.024S | {} | 1.88KB | 655.0B | 
columnar_v3 |
| 0.1 | Success | 2020-10-13 11:52:05.986 | 5.785S | {} | 9.62KB | 5.01KB | 
columnar_v3 |
| 0 | Compacted | 2020-10-13 11:51:51.072 | 1.125S | {} | 8.55KB | 4.25KB | 
columnar_v3 |
+--++--+--+++-+--+
6 rows selected (0.45 seconds)
*0: /> select * from external_primitive limit 13;* --working fine pass;
INFO : Execution ID: 95
+-+---+---+--+-+-++++
| id | name | rank | salary | active | dob | doj | city | dept |
+-+---+---+--+-+-++++
| 1 | AAA | 3 | 3444345.66 | true | 1979-12-09 | 2011-02-09 22:30:20.0 | Pune | 
IT |
| 2 | BBB | 2 | 543124.66 | false | 1987-02-19 | 2017-01-01 09:30:20.0 | 
Bangalore | DATA |
| 3 | CCC | 1 | 787878.888 | false | 1982-05-12 | 2015-11-30 23:50:20.0 | Pune 
| DATA |
| 4 | DDD | 1 | 9.24 | true | 1981-04-09 | 2000-01-15 04:30:20.0 | Delhi | 
MAINS |
| 5 | EEE | 3 | 545656.99 | true | 1987-12-09 | 2017-11-25 01:30:20.0 | Delhi | 
IT |
| 6 | FFF | 2 | 768678.0 | false | 1987-12-20 | 2017-01-10 02:30:20.0 | 
Bangalore | DATA |
| 7 | GGG | 3 | 765665.0 | true | 1983-06-12 | 2016-12-31 23:30:20.0 | Pune | 
IT |
| 8 | HHH | 2 | 567567.66 | false | 1979-01-12 | 1995-01-01 09:30:20.0 | 
Bangalore | DATA |
| 9 | III | 2 | 787878.767 | true | 1985-02-19 | 2005-08-14 22:30:20.0 | Pune | 
DATA |
| 10 | JJJ | 3 | 887877.14 | true | 2000-05-19 | 2016-10-10 09:30:20.0 | 
Bangalore | MAINS |
| 18 | | 3 | 7.86786786787E9 | true | 1980-10-05 | 1995-10-07 19:30:20.0 | 
Bangalore | IT |
| 19 | | 2 | 5464545.33 | true | 1986-06-06 | 2008-08-14 22:30:20.0 | Delhi | 
DATA |
| 20 | NULL | 3 | 7867867.34 | true | 2000-05-01 | 2014-01-18 09:30:20.0 | 
Bangalore | MAINS |
+-+---+---+--+-+-++++
13 rows selected (1.775 seconds)
*0: /> select * from external_primitive limit 14;* --failed getting number 
format exception;
INFO : Execution ID: 97
*java.lang.NumberFormatException: For input string: "776"*
 at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
 at java.lang.Integer.parseInt(Integer.java:569)
 at java.lang.Integer.parseInt(Integer.java:615)
 at java.sql.Date.valueOf(Date.java:133)
 at 

[jira] [Updated] (CARBONDATA-4029) After delete in the table which has Alter-added SDK segments, then the count(*) is 0.

2020-10-30 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-4029:
--
Summary: After delete in the table which has Alter-added SDK segments, then 
the count(*) is 0.  (was: Getting Number format exception while querying on 
date columns in SDK carbon table.)

> After delete in the table which has Alter-added SDK segments, then the 
> count(*) is 0.
> -
>
> Key: CARBONDATA-4029
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4029
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: 3 node FI cluster
>Reporter: Prasanna Ravichandran
>Priority: Minor
> Attachments: Primitive.rar
>
>
> We are getting Number format exception while querying on the date columns. 
> Attached the SDK files also.
> Test queries:
> --SDK compaction;
>  drop table if exists external_primitive;
>  create table external_primitive (id int, name string, rank smallint, salary 
> double, active boolean, dob date, doj timestamp, city string, dept string) 
> stored as carbondata;
>  alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive','format'='carbon');
>  alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');
>  alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon');
>  alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive4','format'='carbon');
>  alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive5','format'='carbon');
>  
>  alter table external_primitive compact 'minor'; --working fine pass;
>  select count(*) from external_primitive;--working fine pass;
> show segments for table external_primitive;
>  select * from external_primitive limit 13; --working fine pass;
>  select * from external_primitive limit 14; --failed getting number format 
> exception;
> select min(dob) from external_primitive; --failed getting number format 
> exception;
> select max(dob) from external_primitive; --working;
> select dob from external_primitive; --failed getting number format exception;
> Console:
> *0: /> show segments for table external_primitive;*
> +--++--+--+++-+--+
> | ID | Status | Load Start Time | Load Time Taken | Partition | Data Size | 
> Index Size | File Format |
> +--++--+--+++-+--+
> | 4 | Success | 2020-10-13 11:52:04.012 | 0.511S | {} | 1.88KB | 655.0B | 
> columnar_v3 |
> | 3 | Compacted | 2020-10-13 11:52:00.587 | 0.828S | {} | 1.88KB | 655.0B | 
> columnar_v3 |
> | 2 | Compacted | 2020-10-13 11:51:57.767 | 0.775S | {} | 1.88KB | 655.0B | 
> columnar_v3 |
> | 1 | Compacted | 2020-10-13 11:51:54.678 | 1.024S | {} | 1.88KB | 655.0B | 
> columnar_v3 |
> | 0.1 | Success | 2020-10-13 11:52:05.986 | 5.785S | {} | 9.62KB | 5.01KB | 
> columnar_v3 |
> | 0 | Compacted | 2020-10-13 11:51:51.072 | 1.125S | {} | 8.55KB | 4.25KB | 
> columnar_v3 |
> +--++--+--+++-+--+
> 6 rows selected (0.45 seconds)
> *0: /> select * from external_primitive limit 13;* --working fine pass;
> INFO : Execution ID: 95
> +-+---+---+--+-+-++++
> | id | name | rank | salary | active | dob | doj | city | dept |
> +-+---+---+--+-+-++++
> | 1 | AAA | 3 | 3444345.66 | true | 1979-12-09 | 2011-02-09 22:30:20.0 | Pune 
> | IT |
> | 2 | BBB | 2 | 543124.66 | false | 1987-02-19 | 2017-01-01 09:30:20.0 | 
> Bangalore | DATA |
> | 3 | CCC | 1 | 787878.888 | false | 1982-05-12 | 2015-11-30 23:50:20.0 | 
> Pune | DATA |
> | 4 | DDD | 1 | 9.24 | true | 1981-04-09 | 2000-01-15 04:30:20.0 | Delhi 
> | MAINS |
> | 5 | EEE | 3 | 545656.99 | true | 1987-12-09 | 2017-11-25 01:30:20.0 | Delhi 
> | IT |
> | 6 | FFF | 2 | 768678.0 | false | 1987-12-20 | 2017-01-10 02:30:20.0 | 
> Bangalore | DATA |
> | 7 | GGG | 3 | 765665.0 | true | 1983-06-12 | 2016-12-31 23:30:20.0 | Pune | 
> IT |
> | 8 | HHH | 2 | 567567.66 | false | 1979-01-12 | 1995-01-01 09:30:20.0 | 
> Bangalore | DATA |
> | 9 | III | 2 | 787878.767 | true | 1985-02-19 | 2005-08-14 22:30:20.0 | Pune 
> | DATA |
> | 10 | JJJ | 3 | 887877.14 | true | 2000-05-19 | 2016-10-10 09:30:20.0 | 
> Bangalore | MAINS |
> 

[jira] [Reopened] (CARBONDATA-4029) Getting Number format exception while querying on date columns in SDK carbon table.

2020-10-29 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran reopened CARBONDATA-4029:
---

> Getting Number format exception while querying on date columns in SDK carbon 
> table.
> ---
>
> Key: CARBONDATA-4029
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4029
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: 3 node FI cluster
>Reporter: Prasanna Ravichandran
>Priority: Minor
> Attachments: Primitive.rar
>
>
> We are getting Number format exception while querying on the date columns. 
> Attached the SDK files also.
> Test queries:
> --SDK compaction;
>  drop table if exists external_primitive;
>  create table external_primitive (id int, name string, rank smallint, salary 
> double, active boolean, dob date, doj timestamp, city string, dept string) 
> stored as carbondata;
>  alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive','format'='carbon');
>  alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');
>  alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon');
>  alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive4','format'='carbon');
>  alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive5','format'='carbon');
>  
>  alter table external_primitive compact 'minor'; --working fine pass;
>  select count(*) from external_primitive;--working fine pass;
> show segments for table external_primitive;
>  select * from external_primitive limit 13; --working fine pass;
>  select * from external_primitive limit 14; --failed getting number format 
> exception;
> select min(dob) from external_primitive; --failed getting number format 
> exception;
> select max(dob) from external_primitive; --working;
> select dob from external_primitive; --failed getting number format exception;
> Console:
> *0: /> show segments for table external_primitive;*
> +--++--+--+++-+--+
> | ID | Status | Load Start Time | Load Time Taken | Partition | Data Size | 
> Index Size | File Format |
> +--++--+--+++-+--+
> | 4 | Success | 2020-10-13 11:52:04.012 | 0.511S | {} | 1.88KB | 655.0B | 
> columnar_v3 |
> | 3 | Compacted | 2020-10-13 11:52:00.587 | 0.828S | {} | 1.88KB | 655.0B | 
> columnar_v3 |
> | 2 | Compacted | 2020-10-13 11:51:57.767 | 0.775S | {} | 1.88KB | 655.0B | 
> columnar_v3 |
> | 1 | Compacted | 2020-10-13 11:51:54.678 | 1.024S | {} | 1.88KB | 655.0B | 
> columnar_v3 |
> | 0.1 | Success | 2020-10-13 11:52:05.986 | 5.785S | {} | 9.62KB | 5.01KB | 
> columnar_v3 |
> | 0 | Compacted | 2020-10-13 11:51:51.072 | 1.125S | {} | 8.55KB | 4.25KB | 
> columnar_v3 |
> +--++--+--+++-+--+
> 6 rows selected (0.45 seconds)
> *0: /> select * from external_primitive limit 13;* --working fine pass;
> INFO : Execution ID: 95
> +-+---+---+--+-+-++++
> | id | name | rank | salary | active | dob | doj | city | dept |
> +-+---+---+--+-+-++++
> | 1 | AAA | 3 | 3444345.66 | true | 1979-12-09 | 2011-02-09 22:30:20.0 | Pune 
> | IT |
> | 2 | BBB | 2 | 543124.66 | false | 1987-02-19 | 2017-01-01 09:30:20.0 | 
> Bangalore | DATA |
> | 3 | CCC | 1 | 787878.888 | false | 1982-05-12 | 2015-11-30 23:50:20.0 | 
> Pune | DATA |
> | 4 | DDD | 1 | 9.24 | true | 1981-04-09 | 2000-01-15 04:30:20.0 | Delhi 
> | MAINS |
> | 5 | EEE | 3 | 545656.99 | true | 1987-12-09 | 2017-11-25 01:30:20.0 | Delhi 
> | IT |
> | 6 | FFF | 2 | 768678.0 | false | 1987-12-20 | 2017-01-10 02:30:20.0 | 
> Bangalore | DATA |
> | 7 | GGG | 3 | 765665.0 | true | 1983-06-12 | 2016-12-31 23:30:20.0 | Pune | 
> IT |
> | 8 | HHH | 2 | 567567.66 | false | 1979-01-12 | 1995-01-01 09:30:20.0 | 
> Bangalore | DATA |
> | 9 | III | 2 | 787878.767 | true | 1985-02-19 | 2005-08-14 22:30:20.0 | Pune 
> | DATA |
> | 10 | JJJ | 3 | 887877.14 | true | 2000-05-19 | 2016-10-10 09:30:20.0 | 
> Bangalore | MAINS |
> | 18 | | 3 | 7.86786786787E9 | true | 1980-10-05 | 1995-10-07 19:30:20.0 | 
> Bangalore | IT |
> | 19 | | 2 | 5464545.33 | true | 1986-06-06 | 2008-08-14 22:30:20.0 | Delhi | 
> DATA |
> | 20 | 

[jira] [Closed] (CARBONDATA-4029) Getting Number format exception while querying on date columns in SDK carbon table.

2020-10-29 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran closed CARBONDATA-4029.
-
Resolution: Won't Fix

> Getting Number format exception while querying on date columns in SDK carbon 
> table.
> ---
>
> Key: CARBONDATA-4029
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4029
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.0.0
> Environment: 3 node FI cluster
>Reporter: Prasanna Ravichandran
>Priority: Minor
> Attachments: Primitive.rar
>
>
> We are getting Number format exception while querying on the date columns. 
> Attached the SDK files also.
> Test queries:
> --SDK compaction;
>  drop table if exists external_primitive;
>  create table external_primitive (id int, name string, rank smallint, salary 
> double, active boolean, dob date, doj timestamp, city string, dept string) 
> stored as carbondata;
>  alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive','format'='carbon');
>  alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');
>  alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon');
>  alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive4','format'='carbon');
>  alter table external_primitive add segment 
> options('path'='hdfs://hacluster/sdkfiles/primitive5','format'='carbon');
>  
>  alter table external_primitive compact 'minor'; --working fine pass;
>  select count(*) from external_primitive;--working fine pass;
> show segments for table external_primitive;
>  select * from external_primitive limit 13; --working fine pass;
>  select * from external_primitive limit 14; --failed getting number format 
> exception;
> select min(dob) from external_primitive; --failed getting number format 
> exception;
> select max(dob) from external_primitive; --working;
> select dob from external_primitive; --failed getting number format exception;
> Console:
> *0: /> show segments for table external_primitive;*
> +--++--+--+++-+--+
> | ID | Status | Load Start Time | Load Time Taken | Partition | Data Size | 
> Index Size | File Format |
> +--++--+--+++-+--+
> | 4 | Success | 2020-10-13 11:52:04.012 | 0.511S | {} | 1.88KB | 655.0B | 
> columnar_v3 |
> | 3 | Compacted | 2020-10-13 11:52:00.587 | 0.828S | {} | 1.88KB | 655.0B | 
> columnar_v3 |
> | 2 | Compacted | 2020-10-13 11:51:57.767 | 0.775S | {} | 1.88KB | 655.0B | 
> columnar_v3 |
> | 1 | Compacted | 2020-10-13 11:51:54.678 | 1.024S | {} | 1.88KB | 655.0B | 
> columnar_v3 |
> | 0.1 | Success | 2020-10-13 11:52:05.986 | 5.785S | {} | 9.62KB | 5.01KB | 
> columnar_v3 |
> | 0 | Compacted | 2020-10-13 11:51:51.072 | 1.125S | {} | 8.55KB | 4.25KB | 
> columnar_v3 |
> +--++--+--+++-+--+
> 6 rows selected (0.45 seconds)
> *0: /> select * from external_primitive limit 13;* --working fine pass;
> INFO : Execution ID: 95
> +-+---+---+--+-+-++++
> | id | name | rank | salary | active | dob | doj | city | dept |
> +-+---+---+--+-+-++++
> | 1 | AAA | 3 | 3444345.66 | true | 1979-12-09 | 2011-02-09 22:30:20.0 | Pune 
> | IT |
> | 2 | BBB | 2 | 543124.66 | false | 1987-02-19 | 2017-01-01 09:30:20.0 | 
> Bangalore | DATA |
> | 3 | CCC | 1 | 787878.888 | false | 1982-05-12 | 2015-11-30 23:50:20.0 | 
> Pune | DATA |
> | 4 | DDD | 1 | 9.24 | true | 1981-04-09 | 2000-01-15 04:30:20.0 | Delhi 
> | MAINS |
> | 5 | EEE | 3 | 545656.99 | true | 1987-12-09 | 2017-11-25 01:30:20.0 | Delhi 
> | IT |
> | 6 | FFF | 2 | 768678.0 | false | 1987-12-20 | 2017-01-10 02:30:20.0 | 
> Bangalore | DATA |
> | 7 | GGG | 3 | 765665.0 | true | 1983-06-12 | 2016-12-31 23:30:20.0 | Pune | 
> IT |
> | 8 | HHH | 2 | 567567.66 | false | 1979-01-12 | 1995-01-01 09:30:20.0 | 
> Bangalore | DATA |
> | 9 | III | 2 | 787878.767 | true | 1985-02-19 | 2005-08-14 22:30:20.0 | Pune 
> | DATA |
> | 10 | JJJ | 3 | 887877.14 | true | 2000-05-19 | 2016-10-10 09:30:20.0 | 
> Bangalore | MAINS |
> | 18 | | 3 | 7.86786786787E9 | true | 1980-10-05 | 1995-10-07 19:30:20.0 | 
> Bangalore | IT |
> | 19 | | 2 | 5464545.33 | true | 1986-06-06 | 2008-08-14 22:30:20.0 | Delhi 

[jira] [Closed] (CARBONDATA-3937) Insert into select from another carbon /parquet table is not working on Hive Beeline on a newly create Hive write format - carbon table. We are getting “Database is n

2020-10-29 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran closed CARBONDATA-3937.
-

> Insert into select from another carbon /parquet table is not working on Hive 
> Beeline on a newly create Hive write format - carbon table. We are getting 
> “Database is not set" error.
> 
>
> Key: CARBONDATA-3937
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3937
> Project: CarbonData
>  Issue Type: Bug
>  Components: hive-integration
>Affects Versions: 2.0.0
>Reporter: Prasanna Ravichandran
>Priority: Major
>
> Insert into select from another carbon or parquet table to a carbon table is 
> not working on Hive Beeline on a newly create Hive write format carbon table. 
> We are getting “Database is not set” error.
>  
> Test queries:
>  drop table if exists hive_carbon;
> create table hive_carbon(id int, name string, scale decimal, country string, 
> salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler';
> insert into hive_carbon select 1,"Ram","2.3","India",3500;
> insert into hive_carbon select 2,"Raju","2.4","Russia",3600;
> insert into hive_carbon select 3,"Raghu","2.5","China",3700;
> insert into hive_carbon select 4,"Ravi","2.6","Australia",3800;
>  
> drop table if exists hive_carbon2;
> create table hive_carbon2(id int, name string, scale decimal, country string, 
> salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler';
> insert into hive_carbon2 select * from hive_carbon;
> select * from hive_carbon;
> select * from hive_carbon2;
>  
>  --execute below queries in spark-beeline;
> create table hive_table(id int, name string, scale decimal, country string, 
> salary double);
>  create table parquet_table(id int, name string, scale decimal, country 
> string, salary double) stored as parquet;
>  insert into hive_table select 1,"Ram","2.3","India",3500;
>  select * from hive_table;
>  insert into parquet_table select 1,"Ram","2.3","India",3500;
>  select * from parquet_table;
> --execute the below query in hive beeline;
> insert into hive_carbon select * from parquet_table;
> Attached the logs for your reference. But the insert into select from the 
> parquet and hive table into carbon table is working fine.
>  
> Only insert into select from hive table to carbon table is only working.
> Error details in MR job which run through hive query:
> Error: java.io.IOException: java.io.IOException: Database name is not set. at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
>  at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:414)
>  at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:843)
>  at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:175) 
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444) at 
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at 
> org.apache.hadoop.mapred.YarnChild$1.run(YarnChild.java:175) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1737)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) Caused by: 
> java.io.IOException: Database name is not set. at 
> org.apache.carbondata.hadoop.api.CarbonInputFormat.getDatabaseName(CarbonInputFormat.java:841)
>  at 
> org.apache.carbondata.hive.MapredCarbonInputFormat.getCarbonTable(MapredCarbonInputFormat.java:80)
>  at 
> org.apache.carbondata.hive.MapredCarbonInputFormat.getQueryModel(MapredCarbonInputFormat.java:215)
>  at 
> org.apache.carbondata.hive.MapredCarbonInputFormat.getRecordReader(MapredCarbonInputFormat.java:205)
>  at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:411)
>  ... 9 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (CARBONDATA-3819) Fileformat column details is not present in the show segments DDL for heterogenous segments table.

2020-10-29 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran closed CARBONDATA-3819.
-

Fixed and verified.

> Fileformat column details is not present in the show segments DDL for 
> heterogenous segments table.
> --
>
> Key: CARBONDATA-3819
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3819
> Project: CarbonData
>  Issue Type: Bug
> Environment: Opensource ANT cluster
>Reporter: Prasanna Ravichandran
>Priority: Minor
> Attachments: fileformat_notworking_actualresult.PNG, 
> fileformat_working_expected.PNG
>
>
> Fileformat column details is not present in the show segments DDL for 
> heterogenous segments table.
> Test steps: 
>  # Create a heterogenous table with added parquet and carbon segments.
>  # DO show segments. 
> Expected results:
> It should show "FileFormat" column details in show segments DDL.
> Actual result: 
> It is not showing the File format column details in show segments DDL.
> See the attached screenshots for more details.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3819) Fileformat column details is not present in the show segments DDL for heterogenous segments table.

2020-10-29 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran resolved CARBONDATA-3819.
---
Resolution: Fixed

This issue is fixed in the latest Carbon jars - 2.0.0.

> Fileformat column details is not present in the show segments DDL for 
> heterogenous segments table.
> --
>
> Key: CARBONDATA-3819
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3819
> Project: CarbonData
>  Issue Type: Bug
> Environment: Opensource ANT cluster
>Reporter: Prasanna Ravichandran
>Priority: Minor
> Attachments: fileformat_notworking_actualresult.PNG, 
> fileformat_working_expected.PNG
>
>
> Fileformat column details is not present in the show segments DDL for 
> heterogenous segments table.
> Test steps: 
>  # Create a heterogenous table with added parquet and carbon segments.
>  # DO show segments. 
> Expected results:
> It should show "FileFormat" column details in show segments DDL.
> Actual result: 
> It is not showing the File format column details in show segments DDL.
> See the attached screenshots for more details.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (CARBONDATA-4012) Documentations issues.

2020-10-22 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran closed CARBONDATA-4012.
-

Complex features details are added to the Opensource document and verified.

> Documentations issues.
> --
>
> Key: CARBONDATA-4012
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4012
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Prasanna Ravichandran
>Priority: Minor
> Fix For: 2.1.0
>
>
> Support Array and Struct of all primitive type reading on presto from Spark 
> Carbon tables. This feature details have to be added in the below opensource 
> link:
> [https://github.com/apache/carbondata/blob/master/docs/prestosql-guide.md]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (CARBONDATA-4024) Select queries with filter and aggregate queries are not working in Hive write - carbon table.

2020-10-22 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran closed CARBONDATA-4024.
-
Resolution: Duplicate

> Select queries with filter and aggregate queries are not working in Hive 
> write - carbon table. 
> ---
>
> Key: CARBONDATA-4024
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4024
> Project: CarbonData
>  Issue Type: Bug
>  Components: hive-integration
>Affects Versions: 2.0.0
>Reporter: Prasanna Ravichandran
>Priority: Major
>
> Select queries with filter and aggregate queries are not working in Hive 
> write - carbon table.
> Hive - console:
> 0: /> use t2;
> INFO : State: Compiling.
> INFO : Compiling 
> command(queryId=omm_20201008191831_ac10f1ae-8d39-4185-b25a-d690134a94be): use 
> t2; 
> Current sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4
> INFO : hive.compile.auto.avoid.cbo=true
> INFO : Concurrency mode is disabled, not creating a lock manager
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO : Completed compiling 
> command(queryId=omm_20201008191831_ac10f1ae-8d39-4185-b25a-d690134a94be); 
> Time taken: 0.122 seconds
> INFO : Concurrency mode is disabled, not creating a lock manager
> INFO : State: Executing.
> INFO : Executing 
> command(queryId=omm_20201008191831_ac10f1ae-8d39-4185-b25a-d690134a94be): use 
> t2; Current sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4
> INFO : Starting task [Stage-0:DDL] in serial mode
> INFO : Completed executing 
> command(queryId=omm_20201008191831_ac10f1ae-8d39-4185-b25a-d690134a94be); 
> Time taken: 0.019 seconds
> INFO : OK
> INFO : Concurrency mode is disabled, not creating a lock manager
> No rows affected (0.207 seconds)
> 0: /> show tables;
> INFO : State: Compiling.
> INFO : Compiling 
> command(queryId=omm_20201008191835_5e1e9469-0054-446f-af82-ec3294ec77b1): 
> show tables; 
> Current sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4
> INFO : hive.compile.auto.avoid.cbo=true
> INFO : Concurrency mode is disabled, not creating a lock manager
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, 
> type:string, comment:from deserializer)], properties:null)
> INFO : Completed compiling 
> command(queryId=omm_20201008191835_5e1e9469-0054-446f-af82-ec3294ec77b1); 
> Time taken: 0.015 seconds
> INFO : Concurrency mode is disabled, not creating a lock manager
> INFO : State: Executing.
> INFO : Executing 
> command(queryId=omm_20201008191835_5e1e9469-0054-446f-af82-ec3294ec77b1): 
> show tables; Current sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4
> INFO : Starting task [Stage-0:DDL] in serial mode
> INFO : Completed executing 
> command(queryId=omm_20201008191835_5e1e9469-0054-446f-af82-ec3294ec77b1); 
> Time taken: 0.016 seconds
> INFO : OK
> INFO : Concurrency mode is disabled, not creating a lock manager
> ++
> | tab_name |
> ++
> | hive_carbon |
> | hive_table |
> | parquet_table |
> ++
> 3 rows selected (0.114 seconds)
> 0: /> select * from hive_carbon;
> INFO : State: Compiling.
> INFO : Compiling 
> command(queryId=omm_20201008191842_9378bab9-181c-455e-aa6d-9b4f787ce6da): 
> select * from hive_carbon; 
> Current sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4
> INFO : hive.compile.auto.avoid.cbo=true
> INFO : Concurrency mode is disabled, not creating a lock manager
> INFO : Current sql is not contains insert syntax, not need record dest table 
> flag
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Returning Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:hive_carbon.id, type:int, 
> comment:null), FieldSchema(name:hive_carbon.name, type:string, comment:null), 
> FieldSchema(name:hive_carbon.scale, type:decimal(10,0), comment:null), 
> FieldSchema(name:hive_carbon.country, type:string, comment:null), 
> FieldSchema(name:hive_carbon.salary, type:double, comment:null)], 
> properties:null)
> INFO : Completed compiling 
> command(queryId=omm_20201008191842_9378bab9-181c-455e-aa6d-9b4f787ce6da); 
> Time taken: 0.511 seconds
> INFO : Concurrency mode is disabled, not creating a lock manager
> INFO : State: Executing.
> INFO : Executing 
> command(queryId=omm_20201008191842_9378bab9-181c-455e-aa6d-9b4f787ce6da): 
> select * from hive_carbon; Current 
> sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4
> INFO : Completed executing 
> command(queryId=omm_20201008191842_9378bab9-181c-455e-aa6d-9b4f787ce6da); 
> Time taken: 0.001 seconds
> INFO : OK
> INFO : Concurrency mode is disabled, not creating a lock manager
> 

[jira] [Updated] (CARBONDATA-3938) In Hive read table, we are unable to read a projection column or read a full scan - select * query. Even the aggregate queries are not working.

2020-10-21 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-3938:
--
Description: 
In Hive read table, we are unable to read a projection column or full scan 
query. But the aggregate queries are working fine.

 

Test query:

 

--spark beeline;

drop table if exists uniqdata;

drop table if exists uniqdata1;

CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, 
DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) stored as carbondata ;

LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata OPTIONS('DELIMITER'=',', 
'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');

CREATE TABLE IF NOT EXISTS uniqdata1 (CUST_ID int,CUST_NAME 
String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 
bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) ROW FORMAT SERDE 'org.apache.carbondata.hive.CarbonHiveSerDe' WITH 
SERDEPROPERTIES 
('mapreduce.input.carboninputformat.databaseName'='default','mapreduce.input.carboninputformat.tableName'='uniqdata')
 STORED AS INPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonInputFormat' 
OUTPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonOutputFormat' LOCATION 
'hdfs://hacluster/user/hive/warehouse/uniqdata';

select  count(*)  from uniqdata1;

 

 

--Hive Beeline;

select count(*) from uniqdata1; --not working, returning 0 rows, eventhough 
2000 rows are there;--Issue 1 on Hive read format table;

select * from uniqdata1; --Return no rows;--Issue 2 - a) full scan on Hive read 
format table;

select cust_id from uniqdata1 limit 5;--Return no rows;–Issue 2-b select query 
with projection, not working, returning now rows;

 Attached the logs for your reference.

With the Hive write table the aggregate& filter queries are not working but 
select * full scan queries are working.

All 3 Issues (Full scan - select *, filter queries and aggregate queries) is 
not working in Hive read format table.

This issue also exists when a normal carbon table(created through stored as 
carbondata) is created in Spark and data is read through select query from Hive 
beeline.)

  was:
In Hive read table, we are unable to read a projection column or full scan 
query. But the aggregate queries are working fine.

 

Test query:

 

--spark beeline;

drop table if exists uniqdata;

drop table if exists uniqdata1;

CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, 
DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) stored as carbondata ;

LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata OPTIONS('DELIMITER'=',', 
'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');

CREATE TABLE IF NOT EXISTS uniqdata1 (CUST_ID int,CUST_NAME 
String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 
bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) ROW FORMAT SERDE 'org.apache.carbondata.hive.CarbonHiveSerDe' WITH 
SERDEPROPERTIES 
('mapreduce.input.carboninputformat.databaseName'='default','mapreduce.input.carboninputformat.tableName'='uniqdata')
 STORED AS INPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonInputFormat' 
OUTPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonOutputFormat' LOCATION 
'hdfs://hacluster/user/hive/warehouse/uniqdata';

select  count(*)  from uniqdata1;

 

 

--Hive Beeline;

select count(*) from uniqdata1; --not working, returning 0 rows, eventhough 
2000 rows are there;--Issue 1 on Hive read format table;

select * from uniqdata1; --Return no rows;--Issue 2 - a) full scan on Hive read 
format table;

select cust_id from uniqdata1 limit 5;--Return no rows;–Issue 2-b select query 
with projection, not working, returning now rows;

 Attached the logs for your reference. With the Hive write table this issue is 
not seen. Issue is only seen in Hive read format table.

This issue also exists when a normal carbon table is created in Spark and read 
through Hive beeline.


> In Hive read table, we are unable to read a projection 

[jira] [Commented] (CARBONDATA-3807) Filter queries and projection queries with bloom columns are not hitting the bloom datamap.

2020-10-14 Thread Prasanna Ravichandran (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213673#comment-17213673
 ] 

Prasanna Ravichandran commented on CARBONDATA-3807:
---

Model plan with bloom details: (Could not attach the screenshot)

== CarbonData Profiler ==
Table Scan on uniqdata
 - total: 2 blocks, 2 blocklets
 - filter: (cust_name <> null and cust_name = CUST_NAME_0)
 - pruned by Main Index
 - skipped: 0 blocks, 0 blocklets
 *- pruned by CG Index*
 *- name: datamapuniq_b1*
 *- provider: bloomfilter*
 - skipped: 0 blocks, 0 blocklets

== Physical Plan ==
AdaptiveSparkPlan(isFinalPlan=false)
+- HashAggregate(keys=[], functions=[count(1)])
 +- Exchange SinglePartition, true, [id=#129]
 +- HashAggregate(keys=[], functions=[partial_count(1)])
 +- Project
 +- Scan carbondata default.uniqdata[] PushedFilters: [IsNotNull(cust_name), 
EqualTo(cust_name,CUST_NAME_0)], ReadSchema: struct

> Filter queries and projection queries with bloom columns are not hitting the 
> bloom datamap.
> ---
>
> Key: CARBONDATA-3807
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3807
> Project: CarbonData
>  Issue Type: Bug
> Environment: Ant cluster - opensource
>Reporter: Prasanna Ravichandran
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: bloom-filtercolumn-plan.png, bloom-show index.png
>
>
> Filter queries and projection queries with bloom columns are not hitting the 
> bloom datamap.
>  Bloom datamap is unused as per plan, even though created.
> Test queries: 
> drop table if exists uniqdata;
>  CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as carbondata;
>  load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> table uniqdata 
> options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' 
> PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');
> show indexes on uniqdata;
> explain select count(*) from uniqdata where cust_name="CUST_NAME_0"; 
> --not hitting;
> explain select cust_name from uniqdata; --not hitting;
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (CARBONDATA-3807) Filter queries and projection queries with bloom columns are not hitting the bloom datamap.

2020-10-14 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran closed CARBONDATA-3807.
-
Fix Version/s: 2.0.0
   Resolution: Not A Bug

After adding the enable.query.statistics and then in the plan verification, we 
could see the Bloom filter related details in the explain query. This will be 
seen in plan, only after the create bloom index + load happens. With only 
create bloom index, it is not happening in plan. 

> Filter queries and projection queries with bloom columns are not hitting the 
> bloom datamap.
> ---
>
> Key: CARBONDATA-3807
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3807
> Project: CarbonData
>  Issue Type: Bug
> Environment: Ant cluster - opensource
>Reporter: Prasanna Ravichandran
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: bloom-filtercolumn-plan.png, bloom-show index.png
>
>
> Filter queries and projection queries with bloom columns are not hitting the 
> bloom datamap.
>  Bloom datamap is unused as per plan, even though created.
> Test queries: 
> drop table if exists uniqdata;
>  CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as carbondata;
>  load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> table uniqdata 
> options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' 
> PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');
> show indexes on uniqdata;
> explain select count(*) from uniqdata where cust_name="CUST_NAME_0"; 
> --not hitting;
> explain select cust_name from uniqdata; --not hitting;
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-3807) Filter queries and projection queries with bloom columns are not hitting the bloom datamap.

2020-10-14 Thread Prasanna Ravichandran (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213670#comment-17213670
 ] 

Prasanna Ravichandran commented on CARBONDATA-3807:
---

!bloom_issue_verification_after_load.PNG!

> Filter queries and projection queries with bloom columns are not hitting the 
> bloom datamap.
> ---
>
> Key: CARBONDATA-3807
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3807
> Project: CarbonData
>  Issue Type: Bug
> Environment: Ant cluster - opensource
>Reporter: Prasanna Ravichandran
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: bloom-filtercolumn-plan.png, bloom-show index.png
>
>
> Filter queries and projection queries with bloom columns are not hitting the 
> bloom datamap.
>  Bloom datamap is unused as per plan, even though created.
> Test queries: 
> drop table if exists uniqdata;
>  CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as carbondata;
>  load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> table uniqdata 
> options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' 
> PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');
> show indexes on uniqdata;
> explain select count(*) from uniqdata where cust_name="CUST_NAME_0"; 
> --not hitting;
> explain select cust_name from uniqdata; --not hitting;
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4029) Getting Number format exception while querying on date columns in SDK carbon table.

2020-10-12 Thread Prasanna Ravichandran (Jira)
Prasanna Ravichandran created CARBONDATA-4029:
-

 Summary: Getting Number format exception while querying on date 
columns in SDK carbon table.
 Key: CARBONDATA-4029
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4029
 Project: CarbonData
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: 3 node FI cluster
Reporter: Prasanna Ravichandran
 Attachments: Primitive.rar

We are getting Number format exception while querying on the date columns. 
Attached the SDK files also.

Test queries:

--SDK compaction;
 drop table if exists external_primitive;
 create table external_primitive (id int, name string, rank smallint, salary 
double, active boolean, dob date, doj timestamp, city string, dept string) 
stored as carbondata;
 alter table external_primitive add segment 
options('path'='hdfs://hacluster/sdkfiles/primitive','format'='carbon');
 alter table external_primitive add segment 
options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');
 alter table external_primitive add segment 
options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon');
 alter table external_primitive add segment 
options('path'='hdfs://hacluster/sdkfiles/primitive4','format'='carbon');
 alter table external_primitive add segment 
options('path'='hdfs://hacluster/sdkfiles/primitive5','format'='carbon');
 
 alter table external_primitive compact 'minor'; --working fine pass;
 select count(*) from external_primitive;--working fine pass;

show segments for table external_primitive;
 select * from external_primitive limit 13; --working fine pass;
 select * from external_primitive limit 14; --failed getting number format 
exception;
select min(dob) from external_primitive; --failed getting number format 
exception;
select max(dob) from external_primitive; --working;
select dob from external_primitive; --failed getting number format exception;

Console:

*0: /> show segments for table external_primitive;*
+--++--+--+++-+--+
| ID | Status | Load Start Time | Load Time Taken | Partition | Data Size | 
Index Size | File Format |
+--++--+--+++-+--+
| 4 | Success | 2020-10-13 11:52:04.012 | 0.511S | {} | 1.88KB | 655.0B | 
columnar_v3 |
| 3 | Compacted | 2020-10-13 11:52:00.587 | 0.828S | {} | 1.88KB | 655.0B | 
columnar_v3 |
| 2 | Compacted | 2020-10-13 11:51:57.767 | 0.775S | {} | 1.88KB | 655.0B | 
columnar_v3 |
| 1 | Compacted | 2020-10-13 11:51:54.678 | 1.024S | {} | 1.88KB | 655.0B | 
columnar_v3 |
| 0.1 | Success | 2020-10-13 11:52:05.986 | 5.785S | {} | 9.62KB | 5.01KB | 
columnar_v3 |
| 0 | Compacted | 2020-10-13 11:51:51.072 | 1.125S | {} | 8.55KB | 4.25KB | 
columnar_v3 |
+--++--+--+++-+--+
6 rows selected (0.45 seconds)
*0: /> select * from external_primitive limit 13;* --working fine pass;
INFO : Execution ID: 95
+-+---+---+--+-+-++++
| id | name | rank | salary | active | dob | doj | city | dept |
+-+---+---+--+-+-++++
| 1 | AAA | 3 | 3444345.66 | true | 1979-12-09 | 2011-02-09 22:30:20.0 | Pune | 
IT |
| 2 | BBB | 2 | 543124.66 | false | 1987-02-19 | 2017-01-01 09:30:20.0 | 
Bangalore | DATA |
| 3 | CCC | 1 | 787878.888 | false | 1982-05-12 | 2015-11-30 23:50:20.0 | Pune 
| DATA |
| 4 | DDD | 1 | 9.24 | true | 1981-04-09 | 2000-01-15 04:30:20.0 | Delhi | 
MAINS |
| 5 | EEE | 3 | 545656.99 | true | 1987-12-09 | 2017-11-25 01:30:20.0 | Delhi | 
IT |
| 6 | FFF | 2 | 768678.0 | false | 1987-12-20 | 2017-01-10 02:30:20.0 | 
Bangalore | DATA |
| 7 | GGG | 3 | 765665.0 | true | 1983-06-12 | 2016-12-31 23:30:20.0 | Pune | 
IT |
| 8 | HHH | 2 | 567567.66 | false | 1979-01-12 | 1995-01-01 09:30:20.0 | 
Bangalore | DATA |
| 9 | III | 2 | 787878.767 | true | 1985-02-19 | 2005-08-14 22:30:20.0 | Pune | 
DATA |
| 10 | JJJ | 3 | 887877.14 | true | 2000-05-19 | 2016-10-10 09:30:20.0 | 
Bangalore | MAINS |
| 18 | | 3 | 7.86786786787E9 | true | 1980-10-05 | 1995-10-07 19:30:20.0 | 
Bangalore | IT |
| 19 | | 2 | 5464545.33 | true | 1986-06-06 | 2008-08-14 22:30:20.0 | Delhi | 
DATA |
| 20 | NULL | 3 | 7867867.34 | true | 2000-05-01 | 2014-01-18 09:30:20.0 | 
Bangalore | MAINS |
+-+---+---+--+-+-++++
13 rows selected (1.775 seconds)
*0: /> select * from external_primitive limit 14;* --failed getting number 
format exception;
INFO : Execution ID: 97
*java.lang.NumberFormatException: For 

[jira] [Created] (CARBONDATA-4024) Select queries with filter and aggregate queries are not working in Hive write - carbon table.

2020-10-07 Thread Prasanna Ravichandran (Jira)
Prasanna Ravichandran created CARBONDATA-4024:
-

 Summary: Select queries with filter and aggregate queries are not 
working in Hive write - carbon table. 
 Key: CARBONDATA-4024
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4024
 Project: CarbonData
  Issue Type: Bug
  Components: hive-integration
Affects Versions: 2.0.0
Reporter: Prasanna Ravichandran


Select queries with filter and aggregate queries are not working in Hive write 
- carbon table.

Hive - console:

0: /> use t2;
INFO : State: Compiling.
INFO : Compiling 
command(queryId=omm_20201008191831_ac10f1ae-8d39-4185-b25a-d690134a94be): use 
t2; 
Current sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4
INFO : hive.compile.auto.avoid.cbo=true
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling 
command(queryId=omm_20201008191831_ac10f1ae-8d39-4185-b25a-d690134a94be); Time 
taken: 0.122 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : State: Executing.
INFO : Executing 
command(queryId=omm_20201008191831_ac10f1ae-8d39-4185-b25a-d690134a94be): use 
t2; Current sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing 
command(queryId=omm_20201008191831_ac10f1ae-8d39-4185-b25a-d690134a94be); Time 
taken: 0.019 seconds
INFO : OK
INFO : Concurrency mode is disabled, not creating a lock manager
No rows affected (0.207 seconds)
0: /> show tables;
INFO : State: Compiling.
INFO : Compiling 
command(queryId=omm_20201008191835_5e1e9469-0054-446f-af82-ec3294ec77b1): show 
tables; 
Current sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4
INFO : hive.compile.auto.avoid.cbo=true
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, 
type:string, comment:from deserializer)], properties:null)
INFO : Completed compiling 
command(queryId=omm_20201008191835_5e1e9469-0054-446f-af82-ec3294ec77b1); Time 
taken: 0.015 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : State: Executing.
INFO : Executing 
command(queryId=omm_20201008191835_5e1e9469-0054-446f-af82-ec3294ec77b1): show 
tables; Current sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing 
command(queryId=omm_20201008191835_5e1e9469-0054-446f-af82-ec3294ec77b1); Time 
taken: 0.016 seconds
INFO : OK
INFO : Concurrency mode is disabled, not creating a lock manager
++
| tab_name |
++
| hive_carbon |
| hive_table |
| parquet_table |
++
3 rows selected (0.114 seconds)
0: /> select * from hive_carbon;
INFO : State: Compiling.
INFO : Compiling 
command(queryId=omm_20201008191842_9378bab9-181c-455e-aa6d-9b4f787ce6da): 
select * from hive_carbon; 
Current sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4
INFO : hive.compile.auto.avoid.cbo=true
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Current sql is not contains insert syntax, not need record dest table 
flag
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: 
Schema(fieldSchemas:[FieldSchema(name:hive_carbon.id, type:int, comment:null), 
FieldSchema(name:hive_carbon.name, type:string, comment:null), 
FieldSchema(name:hive_carbon.scale, type:decimal(10,0), comment:null), 
FieldSchema(name:hive_carbon.country, type:string, comment:null), 
FieldSchema(name:hive_carbon.salary, type:double, comment:null)], 
properties:null)
INFO : Completed compiling 
command(queryId=omm_20201008191842_9378bab9-181c-455e-aa6d-9b4f787ce6da); Time 
taken: 0.511 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : State: Executing.
INFO : Executing 
command(queryId=omm_20201008191842_9378bab9-181c-455e-aa6d-9b4f787ce6da): 
select * from hive_carbon; Current 
sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4
INFO : Completed executing 
command(queryId=omm_20201008191842_9378bab9-181c-455e-aa6d-9b4f787ce6da); Time 
taken: 0.001 seconds
INFO : OK
INFO : Concurrency mode is disabled, not creating a lock manager
+-+---++--+-+
| hive_carbon.id | hive_carbon.name | hive_carbon.scale | hive_carbon.country | 
hive_carbon.salary |
+-+---++--+-+
| 1 | Ram | 2 | India | 3500.0 |
+-+---++--+-+
1 row selected (0.614 seconds)
0: /> select * from hive_carbon where 

[jira] [Updated] (CARBONDATA-3937) Insert into select from another carbon /parquet table is not working on Hive Beeline on a newly create Hive write format - carbon table. We are getting “Database is

2020-10-07 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-3937:
--
Description: 
Insert into select from another carbon or parquet table to a carbon table is 
not working on Hive Beeline on a newly create Hive write format carbon table. 
We are getting “Database is not set” error.

 

Test queries:

 drop table if exists hive_carbon;

create table hive_carbon(id int, name string, scale decimal, country string, 
salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler';

insert into hive_carbon select 1,"Ram","2.3","India",3500;

insert into hive_carbon select 2,"Raju","2.4","Russia",3600;

insert into hive_carbon select 3,"Raghu","2.5","China",3700;

insert into hive_carbon select 4,"Ravi","2.6","Australia",3800;

 

drop table if exists hive_carbon2;

create table hive_carbon2(id int, name string, scale decimal, country string, 
salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler';

insert into hive_carbon2 select * from hive_carbon;

select * from hive_carbon;

select * from hive_carbon2;

 

 --execute below queries in spark-beeline;

create table hive_table(id int, name string, scale decimal, country string, 
salary double);
 create table parquet_table(id int, name string, scale decimal, country string, 
salary double) stored as parquet;
 insert into hive_table select 1,"Ram","2.3","India",3500;
 select * from hive_table;
 insert into parquet_table select 1,"Ram","2.3","India",3500;
 select * from parquet_table;

--execute the below query in hive beeline;

insert into hive_carbon select * from parquet_table;

Attached the logs for your reference. But the insert into select from the 
parquet and hive table into carbon table is working fine.

 

Only insert into select from hive table to carbon table is only working.

Error details in MR job which run through hive query:

Error: java.io.IOException: java.io.IOException: Database name is not set. at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
 at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
 at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:414)
 at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:843)
 at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:175) 
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at 
org.apache.hadoop.mapred.YarnChild$1.run(YarnChild.java:175) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1737)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) Caused by: 
java.io.IOException: Database name is not set. at 
org.apache.carbondata.hadoop.api.CarbonInputFormat.getDatabaseName(CarbonInputFormat.java:841)
 at 
org.apache.carbondata.hive.MapredCarbonInputFormat.getCarbonTable(MapredCarbonInputFormat.java:80)
 at 
org.apache.carbondata.hive.MapredCarbonInputFormat.getQueryModel(MapredCarbonInputFormat.java:215)
 at 
org.apache.carbondata.hive.MapredCarbonInputFormat.getRecordReader(MapredCarbonInputFormat.java:205)
 at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:411)
 ... 9 more

  was:
Insert into select from another carbon or parquet table to a carbon table is 
not working on Hive Beeline on a newly create Hive write format carbon table. 
We are getting “Database is not set” error.

 

Test queries:

 drop table if exists hive_carbon;

create table hive_carbon(id int, name string, scale decimal, country string, 
salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler';

insert into hive_carbon select 1,"Ram","2.3","India",3500;

insert into hive_carbon select 2,"Raju","2.4","Russia",3600;

insert into hive_carbon select 3,"Raghu","2.5","China",3700;

insert into hive_carbon select 4,"Ravi","2.6","Australia",3800;

 

drop table if exists hive_carbon2;

create table hive_carbon2(id int, name string, scale decimal, country string, 
salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler';

insert into hive_carbon2 select * from hive_carbon;

select * from hive_carbon;

select * from hive_carbon2;

 

 --execute below queries in spark-beeline;

create table hive_table(id int, name string, scale decimal, country string, 
salary double);
 create table parquet_table(id int, name string, scale decimal, country string, 
salary double) stored as parquet;
 insert into hive_table select 1,"Ram","2.3","India",3500;
 select * from hive_table;
 insert into 

[jira] [Updated] (CARBONDATA-3937) Insert into select from another carbon /parquet table is not working on Hive Beeline on a newly create Hive write format - carbon table. We are getting “Database is

2020-10-07 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-3937:
--
Description: 
Insert into select from another carbon or parquet table to a carbon table is 
not working on Hive Beeline on a newly create Hive write format carbon table. 
We are getting “Database is not set” error.

 

Test queries:

 drop table if exists hive_carbon;

create table hive_carbon(id int, name string, scale decimal, country string, 
salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler';

insert into hive_carbon select 1,"Ram","2.3","India",3500;

insert into hive_carbon select 2,"Raju","2.4","Russia",3600;

insert into hive_carbon select 3,"Raghu","2.5","China",3700;

insert into hive_carbon select 4,"Ravi","2.6","Australia",3800;

 

drop table if exists hive_carbon2;

create table hive_carbon2(id int, name string, scale decimal, country string, 
salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler';

insert into hive_carbon2 select * from hive_carbon;

select * from hive_carbon;

select * from hive_carbon2;

 

 --execute below queries in spark-beeline;

create table hive_table(id int, name string, scale decimal, country string, 
salary double);
 create table parquet_table(id int, name string, scale decimal, country string, 
salary double) stored as parquet;
 insert into hive_table select 1,"Ram","2.3","India",3500;
 select * from hive_table;
 insert into parquet_table select 1,"Ram","2.3","India",3500;
 select * from parquet_table;

--execute the below query in hive beeline;

insert into hive_carbon select * from parquet_table;

Attached the logs for your reference. But the insert into select from the 
parquet and hive table into carbon table is working fine.

 

Error details in MR job which run through hive query:

Error: java.io.IOException: java.io.IOException: Database name is not set. at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
 at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
 at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:414)
 at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:843)
 at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:175) 
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at 
org.apache.hadoop.mapred.YarnChild$1.run(YarnChild.java:175) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1737)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) Caused by: 
java.io.IOException: Database name is not set. at 
org.apache.carbondata.hadoop.api.CarbonInputFormat.getDatabaseName(CarbonInputFormat.java:841)
 at 
org.apache.carbondata.hive.MapredCarbonInputFormat.getCarbonTable(MapredCarbonInputFormat.java:80)
 at 
org.apache.carbondata.hive.MapredCarbonInputFormat.getQueryModel(MapredCarbonInputFormat.java:215)
 at 
org.apache.carbondata.hive.MapredCarbonInputFormat.getRecordReader(MapredCarbonInputFormat.java:205)
 at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:411)
 ... 9 more

  was:
Insert into select from another carbon table is not working on Hive Beeline on 
a newly create Hive write format carbon table. We are getting “Carbondata files 
not found error”.

 

Test queries:

 drop table if exists hive_carbon;

create table hive_carbon(id int, name string, scale decimal, country string, 
salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler';

insert into hive_carbon select 1,"Ram","2.3","India",3500;

insert into hive_carbon select 2,"Raju","2.4","Russia",3600;

insert into hive_carbon select 3,"Raghu","2.5","China",3700;

insert into hive_carbon select 4,"Ravi","2.6","Australia",3800;

 

drop table if exists hive_carbon2;

create table hive_carbon2(id int, name string, scale decimal, country string, 
salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler';

insert into hive_carbon2 select * from hive_carbon;

select * from hive_carbon;

select * from hive_carbon2;

 

 --execute below queries in spark-beeline;

create table hive_table(id int, name string, scale decimal, country string, 
salary double);
create table parquet_table(id int, name string, scale decimal, country string, 
salary double) stored as parquet;
insert into hive_table select 1,"Ram","2.3","India",3500;
select * from hive_table;
insert into parquet_table select 1,"Ram","2.3","India",3500;
select * from parquet_table;

--execute the below query in hive 

[jira] [Updated] (CARBONDATA-3937) Insert into select from another carbon /parquet table is not working on Hive Beeline on a newly create Hive write format - carbon table. We are getting “Database is

2020-10-07 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-3937:
--
Description: 
Insert into select from another carbon table is not working on Hive Beeline on 
a newly create Hive write format carbon table. We are getting “Carbondata files 
not found error”.

 

Test queries:

 drop table if exists hive_carbon;

create table hive_carbon(id int, name string, scale decimal, country string, 
salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler';

insert into hive_carbon select 1,"Ram","2.3","India",3500;

insert into hive_carbon select 2,"Raju","2.4","Russia",3600;

insert into hive_carbon select 3,"Raghu","2.5","China",3700;

insert into hive_carbon select 4,"Ravi","2.6","Australia",3800;

 

drop table if exists hive_carbon2;

create table hive_carbon2(id int, name string, scale decimal, country string, 
salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler';

insert into hive_carbon2 select * from hive_carbon;

select * from hive_carbon;

select * from hive_carbon2;

 

 --execute below queries in spark-beeline;

create table hive_table(id int, name string, scale decimal, country string, 
salary double);
create table parquet_table(id int, name string, scale decimal, country string, 
salary double) stored as parquet;
insert into hive_table select 1,"Ram","2.3","India",3500;
select * from hive_table;
insert into parquet_table select 1,"Ram","2.3","India",3500;
select * from parquet_table;

--execute the below query in hive beeline;

insert into hive_carbon select * from parquet_table;

Attached the logs for your reference. But the insert into select from the 
parquet and hive table into carbon table is working fine.

 

Error details in MR job which run through hive query:

Error: java.io.IOException: java.io.IOException: Database name is not set. at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
 at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
 at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:414)
 at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:843)
 at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:175) 
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at 
org.apache.hadoop.mapred.YarnChild$1.run(YarnChild.java:175) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1737)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) Caused by: 
java.io.IOException: Database name is not set. at 
org.apache.carbondata.hadoop.api.CarbonInputFormat.getDatabaseName(CarbonInputFormat.java:841)
 at 
org.apache.carbondata.hive.MapredCarbonInputFormat.getCarbonTable(MapredCarbonInputFormat.java:80)
 at 
org.apache.carbondata.hive.MapredCarbonInputFormat.getQueryModel(MapredCarbonInputFormat.java:215)
 at 
org.apache.carbondata.hive.MapredCarbonInputFormat.getRecordReader(MapredCarbonInputFormat.java:205)
 at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:411)
 ... 9 more

  was:
Insert into select from another carbon table is not working on Hive Beeline on 
a newly create Hive write format carbon table. We are getting “Carbondata files 
not found error”.

 

Test queries:

 drop table if exists hive_carbon;

create table hive_carbon(id int, name string, scale decimal, country string, 
salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler';

insert into hive_carbon select 1,"Ram","2.3","India",3500;

insert into hive_carbon select 2,"Raju","2.4","Russia",3600;

insert into hive_carbon select 3,"Raghu","2.5","China",3700;

insert into hive_carbon select 4,"Ravi","2.6","Australia",3800;

 

drop table if exists hive_carbon2;

create table hive_carbon2(id int, name string, scale decimal, country string, 
salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler';

insert into hive_carbon2 select * from hive_carbon;

select * from hive_carbon;

select * from hive_carbon2;

 

 

Attached the logs for your reference. But the insert into select from the 
parquet and hive table into carbon table is working fine.

 

Error details in MR job which run through hive query:

Error: java.io.IOException: java.io.IOException: CarbonData file is not present 
in the table location at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
 at 

[jira] [Updated] (CARBONDATA-3938) In Hive read table, we are unable to read a projection column or read a full scan - select * query. Even the aggregate queries are not working.

2020-10-07 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-3938:
--
Description: 
In Hive read table, we are unable to read a projection column or full scan 
query. But the aggregate queries are working fine.

 

Test query:

 

--spark beeline;

drop table if exists uniqdata;

drop table if exists uniqdata1;

CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, 
DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) stored as carbondata ;

LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata OPTIONS('DELIMITER'=',', 
'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');

CREATE TABLE IF NOT EXISTS uniqdata1 (CUST_ID int,CUST_NAME 
String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 
bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) ROW FORMAT SERDE 'org.apache.carbondata.hive.CarbonHiveSerDe' WITH 
SERDEPROPERTIES 
('mapreduce.input.carboninputformat.databaseName'='default','mapreduce.input.carboninputformat.tableName'='uniqdata')
 STORED AS INPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonInputFormat' 
OUTPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonOutputFormat' LOCATION 
'hdfs://hacluster/user/hive/warehouse/uniqdata';

select  count(*)  from uniqdata1;

 

 

--Hive Beeline;

select count(*) from uniqdata1; --not working, returning 0 rows, eventhough 
2000 rows are there;--Issue 1 on Hive read format table;

select * from uniqdata1; --Return no rows;--Issue 2 - a) full scan on Hive read 
format table;

select cust_id from uniqdata1 limit 5;--Return no rows;–Issue 2-b select query 
with projection, not working, returning now rows;

 Attached the logs for your reference. With the Hive write table this issue is 
not seen. Issue is only seen in Hive read format table.

This issue also exists when a normal carbon table is created in Spark and read 
through Hive beeline.

  was:
In Hive read table, we are unable to read a projection column or full scan 
query. But the aggregate queries are working fine.

 

Test query:

 

--spark beeline;

drop table if exists uniqdata;

drop table if exists uniqdata1;

CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, 
DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) stored as carbondata ;

LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata OPTIONS('DELIMITER'=',', 
'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');

CREATE TABLE IF NOT EXISTS uniqdata1 (CUST_ID int,CUST_NAME 
String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 
bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) ROW FORMAT SERDE 'org.apache.carbondata.hive.CarbonHiveSerDe' WITH 
SERDEPROPERTIES 
('mapreduce.input.carboninputformat.databaseName'='default','mapreduce.input.carboninputformat.tableName'='uniqdata')
 STORED AS INPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonInputFormat' 
OUTPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonOutputFormat' LOCATION 
'hdfs://hacluster/user/hive/warehouse/uniqdata';

select  count(*)  from uniqdata1;

 

 

--Hive Beeline;

select count(*) from uniqdata1; --Returns 2000;

select count(*) from uniqdata; --Returns 2000 - working fine;

select * from uniqdata1; --Return no rows;–Issue 1 on Hive read format table;

select * from uniqdata;–Returns no rows;–Issue 2 while reading a normal carbon 
table created in spark;

select cust_id from uniqdata1 limit 5;--Return no rows;

 Attached the logs for your reference. With the Hive write table this issue is 
not seen. Issue is only seen in Hive read format table.

This issue also exists when a normal carbon table is created in Spark and read 
through Hive beeline.

Summary: In Hive read table, we are unable to read a projection column 
or read a full scan - select * query. Even the aggregate queries are not 
working.  (was: In Hive read table, we are unable to read a projection column 
or read a full scan - select * query. 

[jira] [Created] (CARBONDATA-4022) Getting the error - "PathName is not a valid DFS filename." with index server and after adding carbon SDK segments and then doing select/update/delete operations.

2020-10-01 Thread Prasanna Ravichandran (Jira)
Prasanna Ravichandran created CARBONDATA-4022:
-

 Summary: Getting the error - "PathName is not a valid DFS 
filename." with index server and after adding carbon SDK segments and then 
doing select/update/delete operations.
 Key: CARBONDATA-4022
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4022
 Project: CarbonData
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanna Ravichandran


 Getting this error - "PathName is not a valid DFS filename." during the 
update/delete/select queries on a added SDK segment table. Also the path 
represented in the error is not proper, which is the cause of error. This is 
seen only when index server is running and disable fallback is true.

Queries and errors:

> create table sdk_2level_1(name string, rec1 
> struct>) stored as carbondata;
+-+
| Result |
+-+
+-+
No rows selected (0.425 seconds)
> alter table sdk_2level_1 add segment 
> options('path'='hdfs://hacluster/sdkfiles/twolevelnestedrecwitharray','format'='carbondata');
+-+
| Result |
+-+
+-+
No rows selected (0.77 seconds)
> select * from sdk_2level_1;
INFO : Execution ID: 1855
Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0 in stage 600.0 failed 4 times, most recent failure: Lost task 0.3 in stage 
600.0 (TID 21345, linux, executor 16): java.lang.IllegalArgumentException: 
Pathname 
/user/hive/warehouse/carbon.store/rps/sdk_2level_1hdfs:/hacluster/sdkfiles/twolevelnestedrecwitharray/part-0-188852617294480_batchno0-0-null-188852332673632.carbondata
 from 
hdfs://hacluster/user/hive/warehouse/carbon.store/rps/sdk_2level_1hdfs:/hacluster/sdkfiles/twolevelnestedrecwitharray/part-0-188852617294480_batchno0-0-null-188852332673632.carbondata
 is not a valid DFS filename.
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:249)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:332)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:328)
 at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:340)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:955)
 at 
org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getDataInputStream(AbstractDFSCarbonFile.java:316)
 at 
org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getDataInputStream(AbstractDFSCarbonFile.java:293)
 at 
org.apache.carbondata.core.datastore.impl.FileFactory.getDataInputStream(FileFactory.java:198)
 at 
org.apache.carbondata.core.datastore.impl.FileFactory.getDataInputStream(FileFactory.java:188)
 at org.apache.carbondata.core.reader.ThriftReader.open(ThriftReader.java:100)
 at 
org.apache.carbondata.core.reader.CarbonHeaderReader.readHeader(CarbonHeaderReader.java:60)
 at 
org.apache.carbondata.core.util.DataFileFooterConverterV3.readDataFileFooter(DataFileFooterConverterV3.java:65)
 at 
org.apache.carbondata.core.util.CarbonUtil.getDataFileFooter(CarbonUtil.java:902)
 at 
org.apache.carbondata.core.util.CarbonUtil.readMetadataFile(CarbonUtil.java:874)
 at 
org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getDataBlocks(AbstractQueryExecutor.java:216)
 at 
org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:138)
 at 
org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:382)
 at 
org.apache.carbondata.core.scan.executor.impl.DetailQueryExecutor.execute(DetailQueryExecutor.java:47)
 at 
org.apache.carbondata.hadoop.CarbonRecordReader.initialize(CarbonRecordReader.java:117)
 at 
org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:540)
 at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
 at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:584)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:301)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:293)
 at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:857)
 at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:857)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
 at 

[jira] [Created] (CARBONDATA-4021) With Index server running, Upon executing count* we are getting the below error, after adding the parquet and ORC segment.

2020-10-01 Thread Prasanna Ravichandran (Jira)
Prasanna Ravichandran created CARBONDATA-4021:
-

 Summary: With Index server running, Upon executing count* we are 
getting the below error, after adding the parquet and ORC segment. 
 Key: CARBONDATA-4021
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4021
 Project: CarbonData
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanna Ravichandran


We are getting below issues while index server enable and index server fallback 
disable is configured as true. With count* we are getting the below error, 
after adding the parquet and ORC segment.

Queries and error:

> use rps;

+-+|

Result  |

+-+

+-+

No rows selected (0.054 seconds)

> drop table if exists uniqdata;

+-+
|Result|

+-+

+-+

No rows selected (0.229 seconds)

> CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as carbondata;

+-+
|Result|

+-+

+-+

No rows selected (0.756 seconds)

> load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> table uniqdata 
> options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');

INFO  : Execution ID: 95

+-+|

Result  |

+-+

+-+

No rows selected(2.789 seconds)

 > use default;

+-+
|Result|

+-+

+-+

No rows selected (0.052 seconds)

 > drop table if exists uniqdata;

+-+
|Result|

+-+

+-+

No rows selected (1.122 seconds)

> CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as carbondata;

+-+

|  Result  |

+-+

+-+

No rows selected (0.508 seconds)

> load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> table uniqdata 
> options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');

INFO  : Execution ID: 108

+-+
|Result|

+-+

+-+

No rows selected (1.316 seconds)

> drop table if exists uniqdata_parquet;

+-+
|Result|

+-+

+-+

No rows selected (0.668 seconds)

> CREATE TABLE uniqdata_parquet (cust_id int,cust_name 
> String,active_emui_version string, dob timestamp, doj timestamp, 
> bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), 
> decimal_column2 decimal(36,36),double_column1 double, double_column2 
> double,integer_column1 int) stored as parquet;

+-+
|Result|

+-+

+-+

No rows selected (0.397 seconds)

> insert into uniqdata_parquet select * from uniqdata;

INFO  : Execution ID: 116

+-+
|Result|

+-+

+-+

No rows selected (4.805 seconds)

>  drop table if exists uniqdata_orc;

+-+
|Result|

+-+

+-+

No rows selected (0.553 seconds)

> CREATE TABLE uniqdata_orc (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) using orc;

+-+
|Result|

+-+

+-+

No rows selected (0.396 seconds)

> insert into uniqdata_orc select * from uniqdata;

INFO  : Execution ID: 122

+-+
|Result|

+-+

+-+

No rows selected (3.403 seconds)

> use rps;

+-+
|Result|

+-+

+-+

No rows selected (0.06 seconds)

> Alter table uniqdata add segment options 
> ('path'='hdfs://hacluster/user/hive/warehouse/uniqdata_parquet','format'='parquet');

INFO  : Execution ID: 126

+-+
|Result|

+-+

+-+

No rows selected (1.511 seconds)

> Alter table uniqdata add segment options 
> ('path'='hdfs://hacluster/user/hive/warehouse/uniqdata_orc','format'='orc');

+-+
|Result|

+-+

+-+

No rows selected (0.716 seconds)

> select count(*) from uniqdata;

Error: java.io.IOException: 
org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
java.security.PrivilegedActionException: org.apache.spark.SparkException: Job 
aborted due to stage failure: Task 2 in stage 54.0 failed 4 times, most recent 
failure: Lost task 2.3 

[jira] [Commented] (CARBONDATA-3914) We are getting the below error when executing select query on a carbon table when no data is returned from hive beeline.

2020-10-01 Thread Prasanna Ravichandran (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205497#comment-17205497
 ] 

Prasanna Ravichandran commented on CARBONDATA-3914:
---

!image-2020-10-01-18-37-20-242.png!

> We are getting the below error when executing select query on a carbon table 
> when no data is returned from hive beeline.
> 
>
> Key: CARBONDATA-3914
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3914
> Project: CarbonData
>  Issue Type: Bug
>  Components: hive-integration
>Affects Versions: 2.0.0
> Environment: 3 node One track ANT cluster
>Reporter: Prasanna Ravichandran
>Priority: Minor
> Fix For: 2.1.0
>
> Attachments: Nodatareturnedfromcarbontable-IOexception.png
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> If no data is present in the table, then we are getting the below IOException 
> in carbon, while running select queries on that empty table. But in hive even 
> if the table holds no data, then it is working for select queries.
> Expected results: Even the table holds no records it should return 0 or no 
> rows returned. It should not throw error/exception.
> Actual result: It is throwing IO exception - Unable to read carbon schema.
>  
> Attached the screenshot for your reference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-3914) We are getting the below error when executing select query on a carbon table when no data is returned from hive beeline.

2020-10-01 Thread Prasanna Ravichandran (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205495#comment-17205495
 ] 

Prasanna Ravichandran commented on CARBONDATA-3914:
---

Attached the screenshot after the fix.

> We are getting the below error when executing select query on a carbon table 
> when no data is returned from hive beeline.
> 
>
> Key: CARBONDATA-3914
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3914
> Project: CarbonData
>  Issue Type: Bug
>  Components: hive-integration
>Affects Versions: 2.0.0
> Environment: 3 node One track ANT cluster
>Reporter: Prasanna Ravichandran
>Priority: Minor
> Fix For: 2.1.0
>
> Attachments: Nodatareturnedfromcarbontable-IOexception.png
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> If no data is present in the table, then we are getting the below IOException 
> in carbon, while running select queries on that empty table. But in hive even 
> if the table holds no data, then it is working for select queries.
> Expected results: Even the table holds no records it should return 0 or no 
> rows returned. It should not throw error/exception.
> Actual result: It is throwing IO exception - Unable to read carbon schema.
>  
> Attached the screenshot for your reference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (CARBONDATA-3914) We are getting the below error when executing select query on a carbon table when no data is returned from hive beeline.

2020-10-01 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran closed CARBONDATA-3914.
-

This issue is fixed now. Now no errors are thrown, when no rows are present in 
the carbon table.

> We are getting the below error when executing select query on a carbon table 
> when no data is returned from hive beeline.
> 
>
> Key: CARBONDATA-3914
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3914
> Project: CarbonData
>  Issue Type: Bug
>  Components: hive-integration
>Affects Versions: 2.0.0
> Environment: 3 node One track ANT cluster
>Reporter: Prasanna Ravichandran
>Priority: Minor
> Fix For: 2.1.0
>
> Attachments: Nodatareturnedfromcarbontable-IOexception.png
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> If no data is present in the table, then we are getting the below IOException 
> in carbon, while running select queries on that empty table. But in hive even 
> if the table holds no data, then it is working for select queries.
> Expected results: Even the table holds no records it should return 0 or no 
> rows returned. It should not throw error/exception.
> Actual result: It is throwing IO exception - Unable to read carbon schema.
>  
> Attached the screenshot for your reference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4012) Documentations issues.

2020-09-25 Thread Prasanna Ravichandran (Jira)
Prasanna Ravichandran created CARBONDATA-4012:
-

 Summary: Documentations issues.
 Key: CARBONDATA-4012
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4012
 Project: CarbonData
  Issue Type: Bug
Reporter: Prasanna Ravichandran


Support Array and Struct of all primitive type reading from presto from Carbon 
tables. This feature details have to be added in the below opensource link:

[https://github.com/apache/carbondata/blob/master/docs/prestosql-guide.md]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4012) Documentations issues.

2020-09-25 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-4012:
--
Description: 
Support Array and Struct of all primitive type reading on presto from Spark 
Carbon tables. This feature details have to be added in the below opensource 
link:

[https://github.com/apache/carbondata/blob/master/docs/prestosql-guide.md]

  was:
Support Array and Struct of all primitive type reading from presto from Carbon 
tables. This feature details have to be added in the below opensource link:

[https://github.com/apache/carbondata/blob/master/docs/prestosql-guide.md]


> Documentations issues.
> --
>
> Key: CARBONDATA-4012
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4012
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Prasanna Ravichandran
>Priority: Minor
>
> Support Array and Struct of all primitive type reading on presto from Spark 
> Carbon tables. This feature details have to be added in the below opensource 
> link:
> [https://github.com/apache/carbondata/blob/master/docs/prestosql-guide.md]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3938) In Hive read table, we are unable to read a projection column or read a full scan - select * query. But the aggregate queries are working fine.

2020-08-11 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-3938:
--
Description: 
In Hive read table, we are unable to read a projection column or full scan 
query. But the aggregate queries are working fine.

 

Test query:

 

--spark beeline;

drop table if exists uniqdata;

drop table if exists uniqdata1;

CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, 
DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) stored as carbondata ;

LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata OPTIONS('DELIMITER'=',', 
'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');

CREATE TABLE IF NOT EXISTS uniqdata1 (CUST_ID int,CUST_NAME 
String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 
bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) ROW FORMAT SERDE 'org.apache.carbondata.hive.CarbonHiveSerDe' WITH 
SERDEPROPERTIES 
('mapreduce.input.carboninputformat.databaseName'='default','mapreduce.input.carboninputformat.tableName'='uniqdata')
 STORED AS INPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonInputFormat' 
OUTPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonOutputFormat' LOCATION 
'hdfs://hacluster/user/hive/warehouse/uniqdata';

select  count(*)  from uniqdata1;

 

 

--Hive Beeline;

select count(*) from uniqdata1; --Returns 2000;

select count(*) from uniqdata; --Returns 2000 - working fine;

select * from uniqdata1; --Return no rows;–Issue 1 on Hive read format table;

select * from uniqdata;–Returns no rows;–Issue 2 while reading a normal carbon 
table created in spark;

select cust_id from uniqdata1 limit 5;--Return no rows;

 Attached the logs for your reference. With the Hive write table this issue is 
not seen. Issue is only seen in Hive read format table.

This issue also exists when a normal carbon table is created in Spark and read 
through Hive beeline.

  was:
In Hive read table, we are unable to read a projection column or full scan 
query. But the aggregate queries are working fine.

 

Test query:

 

--spark beeline;

drop table if exists uniqdata;

drop table if exists uniqdata1;

CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, 
DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) stored as carbondata ;

LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata OPTIONS('DELIMITER'=',', 
'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');

CREATE TABLE IF NOT EXISTS uniqdata1 (CUST_ID int,CUST_NAME 
String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 
bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) ROW FORMAT SERDE 'org.apache.carbondata.hive.CarbonHiveSerDe' WITH 
SERDEPROPERTIES 
('mapreduce.input.carboninputformat.databaseName'='default','mapreduce.input.carboninputformat.tableName'='uniqdata')
 STORED AS INPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonInputFormat' 
OUTPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonOutputFormat' LOCATION 
'hdfs://hacluster/user/hive/warehouse/uniqdata';

select  count(*)  from uniqdata1;

 

 

--Hive Beeline;

select count(*) from uniqdata1; --Returns 2000;

select * from uniqdata1; --Return no rows;

select cust_id from uniqdata1 limit 5;--Return no rows;

 Attached the logs for your reference. With the Hive write table this issue is 
not seen. Issue is only seen in Hive read format table.


> In Hive read table, we are unable to read a projection column or read a full 
> scan - select * query. But the aggregate queries are working fine.
> ---
>
> Key: CARBONDATA-3938
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3938
> Project: CarbonData
>  Issue Type: Bug
>  Components: hive-integration
>Affects Versions: 2.0.0
>Reporter: Prasanna 

[jira] [Created] (CARBONDATA-3938) In Hive read table, we are unable to read a projection column or read a full scan - select * query. But the aggregate queries are working fine.

2020-07-31 Thread Prasanna Ravichandran (Jira)
Prasanna Ravichandran created CARBONDATA-3938:
-

 Summary: In Hive read table, we are unable to read a projection 
column or read a full scan - select * query. But the aggregate queries are 
working fine.
 Key: CARBONDATA-3938
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3938
 Project: CarbonData
  Issue Type: Bug
  Components: hive-integration
Affects Versions: 2.0.0
Reporter: Prasanna Ravichandran
 Attachments: Hive on MR - Read projection column issue.txt

In Hive read table, we are unable to read a projection column or full scan 
query. But the aggregate queries are working fine.

 

Test query:

 

--spark beeline;

drop table if exists uniqdata;

drop table if exists uniqdata1;

CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, 
DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) stored as carbondata ;

LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata OPTIONS('DELIMITER'=',', 
'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');

CREATE TABLE IF NOT EXISTS uniqdata1 (CUST_ID int,CUST_NAME 
String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 
bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
int) ROW FORMAT SERDE 'org.apache.carbondata.hive.CarbonHiveSerDe' WITH 
SERDEPROPERTIES 
('mapreduce.input.carboninputformat.databaseName'='default','mapreduce.input.carboninputformat.tableName'='uniqdata')
 STORED AS INPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonInputFormat' 
OUTPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonOutputFormat' LOCATION 
'hdfs://hacluster/user/hive/warehouse/uniqdata';

select  count(*)  from uniqdata1;

 

 

--Hive Beeline;

select count(*) from uniqdata1; --Returns 2000;

select * from uniqdata1; --Return no rows;

select cust_id from uniqdata1 limit 5;--Return no rows;

 Attached the logs for your reference. With the Hive write table this issue is 
not seen. Issue is only seen in Hive read format table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3937) Insert into select from another carbon table is not working on Hive Beeline on a newly create Hive write format - carbon table. We are getting “Carbondata files not

2020-07-31 Thread Prasanna Ravichandran (Jira)
Prasanna Ravichandran created CARBONDATA-3937:
-

 Summary: Insert into select from another carbon table is not 
working on Hive Beeline on a newly create Hive write format - carbon table. We 
are getting “Carbondata files not found error"
 Key: CARBONDATA-3937
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3937
 Project: CarbonData
  Issue Type: Bug
  Components: hive-integration
Affects Versions: 2.0.0
Reporter: Prasanna Ravichandran


Insert into select from another carbon table is not working on Hive Beeline on 
a newly create Hive write format carbon table. We are getting “Carbondata files 
not found error”.

 

Test queries:

 drop table if exists hive_carbon;

create table hive_carbon(id int, name string, scale decimal, country string, 
salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler';

insert into hive_carbon select 1,"Ram","2.3","India",3500;

insert into hive_carbon select 2,"Raju","2.4","Russia",3600;

insert into hive_carbon select 3,"Raghu","2.5","China",3700;

insert into hive_carbon select 4,"Ravi","2.6","Australia",3800;

 

drop table if exists hive_carbon2;

create table hive_carbon2(id int, name string, scale decimal, country string, 
salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler';

insert into hive_carbon2 select * from hive_carbon;

select * from hive_carbon;

select * from hive_carbon2;

 

 

Attached the logs for your reference. But the insert into select from the 
parquet and hive table into carbon table is working fine.

 

Error details in MR job which run through hive query:

Error: java.io.IOException: java.io.IOException: CarbonData file is not present 
in the table location at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
 at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
 at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:414)
 at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:843)
 at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:175) 
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at 
org.apache.hadoop.mapred.YarnChild$1.run(YarnChild.java:175) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) Caused by: 
java.io.IOException: CarbonData file is not present in the table location at 
org.apache.carbondata.core.util.CarbonUtil.inferSchema(CarbonUtil.java:2141) at 
org.apache.carbondata.core.metadata.schema.SchemaReader.inferSchema(SchemaReader.java:139)
 at 
org.apache.carbondata.hive.MapredCarbonInputFormat.populateCarbonTable(MapredCarbonInputFormat.java:92)
 at 
org.apache.carbondata.hive.MapredCarbonInputFormat.getCarbonTable(MapredCarbonInputFormat.java:104)
 at 
org.apache.carbondata.hive.MapredCarbonInputFormat.getQueryModel(MapredCarbonInputFormat.java:203)
 at 
org.apache.carbondata.hive.MapredCarbonInputFormat.getRecordReader(MapredCarbonInputFormat.java:192)
 at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:411)
 ... 9 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3915) Correction in the documentation for spark-shell

2020-07-20 Thread Prasanna Ravichandran (Jira)
Prasanna Ravichandran created CARBONDATA-3915:
-

 Summary: Correction in the documentation for spark-shell
 Key: CARBONDATA-3915
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3915
 Project: CarbonData
  Issue Type: Bug
  Components: hive-integration
Affects Versions: 2.0.0
 Environment: 3 node ANT cluster one track.
Reporter: Prasanna Ravichandran


Spark-Shell program is not working, which is given in the 
[https://github.com/apache/carbondata/blob/master/docs/hive-guide.md]

 

Working program is given as below:

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.CarbonSession._
 val newSpark = 
SparkSession.builder().config(sc.getConf).enableHiveSupport.config("spark.sql.extensions","org.apache.spark.sql.CarbonExtensions").getOrCreate()
 newSpark.sql("drop table if exists hive_carbon").show
newSpark.sql("create table hive_carbon(id int, name string, scale decimal, 
country string, salary double) STORED AS carbondata").show
newSpark.sql("LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/samplehive.csv' 
INTO TABLE hive_carbon").show
newSpark.sql("SELECT * FROM hive_carbon").show()

so could update the above working program in the 
[https://github.com/apache/carbondata/blob/master/docs/hive-guide.md] page, 
under the "Start Spark shell by running the following command in the Spark 
directory" section.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3914) We are getting the below error when executing select query on a carbon table when no data is returned from hive beeline.

2020-07-19 Thread Prasanna Ravichandran (Jira)
Prasanna Ravichandran created CARBONDATA-3914:
-

 Summary: We are getting the below error when executing select 
query on a carbon table when no data is returned from hive beeline.
 Key: CARBONDATA-3914
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3914
 Project: CarbonData
  Issue Type: Bug
  Components: hive-integration
Affects Versions: 2.0.0
 Environment: 3 node One track ANT cluster
Reporter: Prasanna Ravichandran
 Attachments: Nodatareturnedfromcarbontable-IOexception.png

If no data is present in the table, then we are getting the below IOException 
in carbon, while running select queries on that empty table. But in hive even 
if the table holds no data, then it is working for select queries.

Expected results: Even the table holds no records it should return 0 or no rows 
returned. It should not throw error/exception.

Actual result: It is throwing IO exception - Unable to read carbon schema.

 

Attached the screenshot for your reference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3908) When a carbon segment is added through the alter add segments query, then it is not accounting the added carbon segment values.

2020-07-16 Thread Prasanna Ravichandran (Jira)
Prasanna Ravichandran created CARBONDATA-3908:
-

 Summary: When a carbon segment is added through the alter add 
segments query, then it is not accounting the added carbon segment values.
 Key: CARBONDATA-3908
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3908
 Project: CarbonData
  Issue Type: Bug
Affects Versions: 2.0.0
 Environment: FI cluster and opensource cluster.
Reporter: Prasanna Ravichandran


When a carbon segment is added through the alter add segments query, then it is 
not accounting the added carbon segment values. If we do count(*) on the added 
segment, then it is always showing as 0.

Test queries:

drop table if exists uniqdata;
CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, 
dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
bigint,decimal_column1 decimal(30,10), decimal_column2 
decimal(36,36),double_column1 double, double_column2 double,integer_column1 
int) stored as carbondata;
load data inpath 'hdfs://hacluster/BabuStore/Data/2000_UniqData.csv' into table 
uniqdata 
options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');

--hdfs dfs -mkdir /uniqdata-carbon-segment;
--hdfs dfs -cp /user/hive/warehouse/uniqdata/Fact/Part0/Segment_0/* 
/uniqdata-carbon-segment/
Alter table uniqdata add segment options 
('path'='hdfs://hacluster/uniqdata-carbon-segment/','format'='carbon');

select count(*) from uniqdata;--4000 expected as one load of 2000 records 
happened and same segment is added again;

set carbon.input.segments.default.uniqdata=1;
select count(*) from uniqdata;--2000 expected - it should just show the records 
count of added segments;

CONSOLE:

/> set carbon.input.segments.default.uniqdata=1;
+-++
| key | value |
+-++
| carbon.input.segments.default.uniqdata | 1 |
+-++
1 row selected (0.192 seconds)
/> select count(*) from uniqdata;
INFO : Execution ID: 1734
+---+
| count(1) |
+---+
| 2000 |
+---+
1 row selected (4.036 seconds)
/> set carbon.input.segments.default.uniqdata=2;
+-++
| key | value |
+-++
| carbon.input.segments.default.uniqdata | 2 |
+-++
1 row selected (0.088 seconds)
/> select count(*) from uniqdata;
INFO : Execution ID: 1745
+---+
| count(1) |
+---+
| 2000 |
+---+
1 row selected (6.056 seconds)
/> set carbon.input.segments.default.uniqdata=3;
+-++
| key | value |
+-++
| carbon.input.segments.default.uniqdata | 3 |
+-++
1 row selected (0.161 seconds)
/> select count(*) from uniqdata;
INFO : Execution ID: 1753
+---+
| count(1) |
+---+
| 0 |
+---+
1 row selected (4.875 seconds)
/> show segments for table uniqdata;
+-+--+--+--+++-+--+
| ID | Status | Load Start Time | Load Time Taken | Partition | Data Size | 
Index Size | File Format |
+-+--+--+--+++-+--+
| 4 | Success | 2020-07-17 16:01:53.673 | 5.579S | {} | 269.10KB | 7.21KB | 
columnar_v3 |
| 3 | Success | 2020-07-17 16:00:24.866 | 0.578S | {} | 88.55KB | 1.81KB | 
columnar_v3 |
| 2 | Success | 2020-07-17 15:07:54.273 | 0.642S | {} | 36.72KB | NA | orc |
| 1 | Success | 2020-07-17 15:03:59.767 | 0.564S | {} | 89.26KB | NA | parquet |
| 0 | Success | 2020-07-16 12:44:32.095 | 4.484S | {} | 88.55KB | 1.81KB | 
columnar_v3 |
+-+--+--+--+++-+--+

Expected result: Records added by adding carbon segment should be considered.

Actual result: Records added by adding carbon segment is not considered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (CARBONDATA-3811) In Flat folder enabled table, it is returning no records while querying.

2020-05-20 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran closed CARBONDATA-3811.
-

Fixed.

> In Flat folder enabled table, it is returning no records while querying.
> 
>
> Key: CARBONDATA-3811
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3811
> Project: CarbonData
>  Issue Type: Bug
> Environment: opensource ANT cluster
>Reporter: Prasanna Ravichandran
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: Flat_folder_returning_zero.png
>
>
> Flat folder table is retuning no records for select queries.
>  
> Test queries:
> drop table if exists uniqdata1;
> CREATE TABLE uniqdata1 (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as carbondata TBLPROPERTIES('flat_folder'='true');
> load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> table uniqdata1 
> options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> select count(*) from uniqdata1;--0;
> select * from uniqdata1 limit 10;--0;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3827) Merge DDL is not working as per the mentioned syntax.

2020-05-18 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-3827:
--
Description: 
This issue is seen with opensource jars. Spark 2.4.5 & Carbon 2.0.

Merge DDL is not working as per the mentioned syntax as in CARBONDATA-3597

Test queries: 
 drop table if exists uniqdata1;
 CREATE TABLE uniqdata1 (cust_id int,cust_name String,active_emui_version 
string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
bigint,decimal_column1 decimal(30,10), decimal_column2 
decimal(36,36),double_column1 double, double_column2 double,integer_column1 
int) stored as carbondata;
 load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata1 
options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
 drop table if exists uniqdata;
 CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
bigint,decimal_column1 decimal(30,10), decimal_column2 
decimal(36,36),double_column1 double, double_column2 double,integer_column1 
int) stored as carbondata;
 load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata 
options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
  
 merge into uniqdata1 as a using uniqdata as b on a.cust_id=b.cust_id; --not 
working , getting parse exeption;

 >merge into uniqdata1 as a using uniqdata as b on a.cust_id=b.cust_id;
Error: org.apache.spark.sql.AnalysisException: == Parser1: 
org.apache.spark.sql.parser.CarbonExtensionSpark2SqlParser ==
[1.1] failure: identifier matching regex (?i)EXPLAIN expected
merge into uniqdata1 as a using uniqdata as b on a.cust_id=b.cust_id
^;
== Parser2: org.apache.spark.sql.execution.SparkSqlParser ==
mismatched input 'merge' expecting \{'(', 'SELECT', 'FROM', 'ADD', 'DESC', 
'EMPOWER', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'INSERT', 'DELETE', 'DESCRIBE', 
'EXPLAIN', 'SHOW', 'USE', 'DROP', 'ALTER', 'MAP', 'SET', 'RESET', 'START', 
'COMMIT', 'ROLLBACK', 'REDUCE', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'DFS', 
'TRUNCATE', 'ANALYZE', 'LIST', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 
'EXPORT', 'IMPORT', 'LOAD', 'HEALTHCHECK'}(line 1, pos 0)
== SQL ==
merge into uniqdata1 as a using uniqdata as b on a.cust_id=b.cust_id
^^^; (state=,code=0)

 

 

 

 

  was:
This issue is seen with opensource jars. Spark 2.4.5 & Carbon 2.0.

Merge DDL is not working as per the mentioned syntax as in CARBONDATA-3597

Test queries: 
 drop table if exists uniqdata1;
 CREATE TABLE uniqdata1 (cust_id int,cust_name String,active_emui_version 
string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
bigint,decimal_column1 decimal(30,10), decimal_column2 
decimal(36,36),double_column1 double, double_column2 double,integer_column1 
int) stored as carbondata;
 load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata1 
options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
 drop table if exists uniqdata;
 CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
bigint,decimal_column1 decimal(30,10), decimal_column2 
decimal(36,36),double_column1 double, double_column2 double,integer_column1 
int) stored as carbondata;
 load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata 
options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
  
 merge into uniqdata1 as a using uniqdata as b on a.cust_id=b.cust_id; --not 
working;

 
 Attached the screenshot for your reference.

 

!image-2020-05-18-21-30-31-344.png!

 

 

 

 


> Merge DDL is not working as per the mentioned syntax.
> -
>
> Key: CARBONDATA-3827
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3827
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Prasanna Ravichandran
>Priority: Major
>
> This issue is seen with opensource jars. Spark 2.4.5 & Carbon 2.0.
> Merge DDL is not working as per the mentioned syntax as in CARBONDATA-3597
> Test queries: 
>  drop table if exists uniqdata1;
>  CREATE TABLE uniqdata1 (cust_id 

[jira] [Updated] (CARBONDATA-3827) Merge DDL is not working as per the mentioned syntax.

2020-05-18 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-3827:
--
Description: 
This issue is seen with opensource jars. Spark 2.4.5 & Carbon 2.0.

Merge DDL is not working as per the mentioned syntax as in CARBONDATA-3597

Test queries: 
 drop table if exists uniqdata1;
 CREATE TABLE uniqdata1 (cust_id int,cust_name String,active_emui_version 
string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
bigint,decimal_column1 decimal(30,10), decimal_column2 
decimal(36,36),double_column1 double, double_column2 double,integer_column1 
int) stored as carbondata;
 load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata1 
options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
 drop table if exists uniqdata;
 CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
bigint,decimal_column1 decimal(30,10), decimal_column2 
decimal(36,36),double_column1 double, double_column2 double,integer_column1 
int) stored as carbondata;
 load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata 
options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
  
 merge into uniqdata1 as a using uniqdata as b on a.cust_id=b.cust_id; --not 
working;

 
 Attached the screenshot for your reference.

 

!image-2020-05-18-21-30-31-344.png!

 

 

 

 

  was:
This issue is seen with opensource jars. Spark 2.4.5 & Carbon 2.0.

Merge DDL is not working as per the mentioned syntax as in CARBONDATA-3597

Test queries: 
drop table if exists uniqdata1;
CREATE TABLE uniqdata1 (cust_id int,cust_name String,active_emui_version 
string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
bigint,decimal_column1 decimal(30,10), decimal_column2 
decimal(36,36),double_column1 double, double_column2 double,integer_column1 
int) stored as carbondata;
load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata1 
options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
drop table if exists uniqdata;
CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, 
dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
bigint,decimal_column1 decimal(30,10), decimal_column2 
decimal(36,36),double_column1 double, double_column2 double,integer_column1 
int) stored as carbondata;
load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata 
options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
 
merge into uniqdata1 as a using uniqdata as b on a.cust_id=b.cust_id; --not 
working;

 
Attached the screenshot for your reference.


> Merge DDL is not working as per the mentioned syntax.
> -
>
> Key: CARBONDATA-3827
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3827
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Prasanna Ravichandran
>Priority: Major
>
> This issue is seen with opensource jars. Spark 2.4.5 & Carbon 2.0.
> Merge DDL is not working as per the mentioned syntax as in CARBONDATA-3597
> Test queries: 
>  drop table if exists uniqdata1;
>  CREATE TABLE uniqdata1 (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as carbondata;
>  load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> table uniqdata1 
> options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
>  drop table if exists uniqdata;
>  CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as 

[jira] [Created] (CARBONDATA-3827) Merge DDL is not working as per the mentioned syntax.

2020-05-18 Thread Prasanna Ravichandran (Jira)
Prasanna Ravichandran created CARBONDATA-3827:
-

 Summary: Merge DDL is not working as per the mentioned syntax.
 Key: CARBONDATA-3827
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3827
 Project: CarbonData
  Issue Type: Bug
Reporter: Prasanna Ravichandran


This issue is seen with opensource jars. Spark 2.4.5 & Carbon 2.0.

Merge DDL is not working as per the mentioned syntax as in CARBONDATA-3597

Test queries: 
drop table if exists uniqdata1;
CREATE TABLE uniqdata1 (cust_id int,cust_name String,active_emui_version 
string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
bigint,decimal_column1 decimal(30,10), decimal_column2 
decimal(36,36),double_column1 double, double_column2 double,integer_column1 
int) stored as carbondata;
load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata1 
options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
drop table if exists uniqdata;
CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, 
dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
bigint,decimal_column1 decimal(30,10), decimal_column2 
decimal(36,36),double_column1 double, double_column2 double,integer_column1 
int) stored as carbondata;
load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata 
options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
 
merge into uniqdata1 as a using uniqdata as b on a.cust_id=b.cust_id; --not 
working;

 
Attached the screenshot for your reference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3826) Merge DDL is not working as per the mentioned syntax.

2020-05-18 Thread Prasanna Ravichandran (Jira)
Prasanna Ravichandran created CARBONDATA-3826:
-

 Summary: Merge DDL is not working as per the mentioned syntax.
 Key: CARBONDATA-3826
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3826
 Project: CarbonData
  Issue Type: Bug
Reporter: Prasanna Ravichandran


This issue is seen with opensource jars. Spark 2.4.5 & Carbon 2.0.

Merge DDL is not working as per the mentioned syntax as in CARBONDATA-3597

Test queries: 
drop table if exists uniqdata1;
CREATE TABLE uniqdata1 (cust_id int,cust_name String,active_emui_version 
string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
bigint,decimal_column1 decimal(30,10), decimal_column2 
decimal(36,36),double_column1 double, double_column2 double,integer_column1 
int) stored as carbondata;
load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata1 
options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
drop table if exists uniqdata;
CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, 
dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
bigint,decimal_column1 decimal(30,10), decimal_column2 
decimal(36,36),double_column1 double, double_column2 double,integer_column1 
int) stored as carbondata;
load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata 
options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
 
merge into uniqdata1 as a using uniqdata as b on a.cust_id=b.cust_id; --not 
working;

 
Attached the screenshot for your reference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3819) Fileformat column details is not present in the show segments DDL for heterogenous segments table.

2020-05-12 Thread Prasanna Ravichandran (Jira)
Prasanna Ravichandran created CARBONDATA-3819:
-

 Summary: Fileformat column details is not present in the show 
segments DDL for heterogenous segments table.
 Key: CARBONDATA-3819
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3819
 Project: CarbonData
  Issue Type: Bug
 Environment: Opensource ANT cluster
Reporter: Prasanna Ravichandran
 Attachments: fileformat_notworking_actualresult.PNG, 
fileformat_working_expected.PNG

Fileformat column details is not present in the show segments DDL for 
heterogenous segments table.

Test steps: 
 # Create a heterogenous table with added parquet and carbon segments.
 # DO show segments. 

Expected results:

It should show "FileFormat" column details in show segments DDL.

Actual result: 

It is not showing the File format column details in show segments DDL.

See the attached screenshots for more details.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3811) In Flat folder enabled table, it is returning no records while querying.

2020-05-10 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-3811:
--
Attachment: Flat_folder_returning_zero.png

> In Flat folder enabled table, it is returning no records while querying.
> 
>
> Key: CARBONDATA-3811
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3811
> Project: CarbonData
>  Issue Type: Bug
> Environment: opensource ANT cluster
>Reporter: Prasanna Ravichandran
>Priority: Major
> Attachments: Flat_folder_returning_zero.png
>
>
> Flat folder table is retuning no records for select queries.
>  
> Test queries:
> drop table if exists uniqdata1;
> CREATE TABLE uniqdata1 (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as carbondata TBLPROPERTIES('flat_folder'='true');
> load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> table uniqdata1 
> options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> select count(*) from uniqdata1;--0;
> select * from uniqdata1 limit 10;--0;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3807) Filter queries and projection queries with bloom columns are not hitting the bloom datamap.

2020-05-10 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-3807:
--
Attachment: bloom-show index.png

> Filter queries and projection queries with bloom columns are not hitting the 
> bloom datamap.
> ---
>
> Key: CARBONDATA-3807
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3807
> Project: CarbonData
>  Issue Type: Bug
> Environment: Ant cluster - opensource
>Reporter: Prasanna Ravichandran
>Priority: Major
> Attachments: bloom-filtercolumn-plan.png, bloom-show index.png
>
>
> Filter queries and projection queries with bloom columns are not hitting the 
> bloom datamap.
>  Bloom datamap is unused as per plan, even though created.
> Test queries: 
> drop table if exists uniqdata;
>  CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as carbondata;
>  load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> table uniqdata 
> options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' 
> PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');
> show indexes on uniqdata;
> explain select count(*) from uniqdata where cust_name="CUST_NAME_0"; 
> --not hitting;
> explain select cust_name from uniqdata; --not hitting;
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3807) Filter queries and projection queries with bloom columns are not hitting the bloom datamap.

2020-05-10 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-3807:
--
Attachment: bloom-filtercolumn-plan.png

> Filter queries and projection queries with bloom columns are not hitting the 
> bloom datamap.
> ---
>
> Key: CARBONDATA-3807
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3807
> Project: CarbonData
>  Issue Type: Bug
> Environment: Ant cluster - opensource
>Reporter: Prasanna Ravichandran
>Priority: Major
> Attachments: bloom-filtercolumn-plan.png
>
>
> Filter queries and projection queries with bloom columns are not hitting the 
> bloom datamap.
>  Bloom datamap is unused as per plan, even though created.
> Test queries: 
> drop table if exists uniqdata;
>  CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as carbondata;
>  load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> table uniqdata 
> options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' 
> PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');
> show indexes on uniqdata;
> explain select count(*) from uniqdata where cust_name="CUST_NAME_0"; 
> --not hitting;
> explain select cust_name from uniqdata; --not hitting;
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3807) Filter queries and projection queries with bloom columns are not hitting the bloom datamap.

2020-05-10 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-3807:
--
Attachment: (was: bloom-filtercolumn-plan.png)

> Filter queries and projection queries with bloom columns are not hitting the 
> bloom datamap.
> ---
>
> Key: CARBONDATA-3807
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3807
> Project: CarbonData
>  Issue Type: Bug
> Environment: Ant cluster - opensource
>Reporter: Prasanna Ravichandran
>Priority: Major
> Attachments: bloom-filtercolumn-plan.png
>
>
> Filter queries and projection queries with bloom columns are not hitting the 
> bloom datamap.
>  Bloom datamap is unused as per plan, even though created.
> Test queries: 
> drop table if exists uniqdata;
>  CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as carbondata;
>  load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> table uniqdata 
> options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' 
> PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');
> show indexes on uniqdata;
> explain select count(*) from uniqdata where cust_name="CUST_NAME_0"; 
> --not hitting;
> explain select cust_name from uniqdata; --not hitting;
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3807) Filter queries and projection queries with bloom columns are not hitting the bloom datamap.

2020-05-10 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-3807:
--
Attachment: (was: bloom-show index.png)

> Filter queries and projection queries with bloom columns are not hitting the 
> bloom datamap.
> ---
>
> Key: CARBONDATA-3807
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3807
> Project: CarbonData
>  Issue Type: Bug
> Environment: Ant cluster - opensource
>Reporter: Prasanna Ravichandran
>Priority: Major
> Attachments: bloom-filtercolumn-plan.png
>
>
> Filter queries and projection queries with bloom columns are not hitting the 
> bloom datamap.
>  Bloom datamap is unused as per plan, even though created.
> Test queries: 
> drop table if exists uniqdata;
>  CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as carbondata;
>  load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> table uniqdata 
> options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' 
> PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');
> show indexes on uniqdata;
> explain select count(*) from uniqdata where cust_name="CUST_NAME_0"; 
> --not hitting;
> explain select cust_name from uniqdata; --not hitting;
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3811) In Flat folder enabled table, it is returning no records while querying.

2020-05-10 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-3811:
--
Attachment: (was: Flat_folder_returning_zero.png)

> In Flat folder enabled table, it is returning no records while querying.
> 
>
> Key: CARBONDATA-3811
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3811
> Project: CarbonData
>  Issue Type: Bug
> Environment: opensource ANT cluster
>Reporter: Prasanna Ravichandran
>Priority: Major
>
> Flat folder table is retuning no records for select queries.
>  
> Test queries:
> drop table if exists uniqdata1;
> CREATE TABLE uniqdata1 (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as carbondata TBLPROPERTIES('flat_folder'='true');
> load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> table uniqdata1 
> options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> select count(*) from uniqdata1;--0;
> select * from uniqdata1 limit 10;--0;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3811) In Flat folder enabled table, it is returning no records while querying.

2020-05-08 Thread Prasanna Ravichandran (Jira)
Prasanna Ravichandran created CARBONDATA-3811:
-

 Summary: In Flat folder enabled table, it is returning no records 
while querying.
 Key: CARBONDATA-3811
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3811
 Project: CarbonData
  Issue Type: Bug
 Environment: opensource ANT cluster
Reporter: Prasanna Ravichandran
 Attachments: Flat_folder_returning_zero.png

Flat folder table is retuning no records for select queries.

 

Test queries:

drop table if exists uniqdata1;
CREATE TABLE uniqdata1 (cust_id int,cust_name String,active_emui_version 
string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
bigint,decimal_column1 decimal(30,10), decimal_column2 
decimal(36,36),double_column1 double, double_column2 double,integer_column1 
int) stored as carbondata TBLPROPERTIES('flat_folder'='true');
load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata1 
options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
select count(*) from uniqdata1;--0;
select * from uniqdata1 limit 10;--0;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3807) Filter queries and projection queries with bloom columns are not hitting the bloom datamap.

2020-05-08 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-3807:
--
Description: 
Filter queries and projection queries with bloom columns are not hitting the 
bloom datamap.

 Bloom datamap is unused as per plan, even though created.

Test queries: 

drop table if exists uniqdata;
 CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
bigint,decimal_column1 decimal(30,10), decimal_column2 
decimal(36,36),double_column1 double, double_column2 double,integer_column1 
int) stored as carbondata;
 load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata 
options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');

create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' 
PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');

show indexes on uniqdata;

explain select count(*) from uniqdata where cust_name="CUST_NAME_0"; --not 
hitting;

explain select cust_name from uniqdata; --not hitting;

 

 

  was:
Filter queries and projection queries with bloom columns are not hitting the 
bloom datamap.

 

Test queries: 

drop table if exists uniqdata;
CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, 
dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
bigint,decimal_column1 decimal(30,10), decimal_column2 
decimal(36,36),double_column1 double, double_column2 double,integer_column1 
int) stored as carbondata;
load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata 
options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');

create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' 
PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');

show indexes on uniqdata;

explain select count(*) from uniqdata where cust_name="CUST_NAME_0"; --not 
hitting;

explain select cust_name from uniqdata; --not hitting;

 

 


> Filter queries and projection queries with bloom columns are not hitting the 
> bloom datamap.
> ---
>
> Key: CARBONDATA-3807
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3807
> Project: CarbonData
>  Issue Type: Bug
> Environment: Ant cluster - opensource
>Reporter: Prasanna Ravichandran
>Priority: Major
> Attachments: bloom-filtercolumn-plan.png, bloom-show index.png
>
>
> Filter queries and projection queries with bloom columns are not hitting the 
> bloom datamap.
>  Bloom datamap is unused as per plan, even though created.
> Test queries: 
> drop table if exists uniqdata;
>  CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> bigint,decimal_column1 decimal(30,10), decimal_column2 
> decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> int) stored as carbondata;
>  load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> table uniqdata 
> options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' 
> PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');
> show indexes on uniqdata;
> explain select count(*) from uniqdata where cust_name="CUST_NAME_0"; 
> --not hitting;
> explain select cust_name from uniqdata; --not hitting;
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3807) Filter queries and projection queries with bloom columns are not hitting the bloom datamap.

2020-05-07 Thread Prasanna Ravichandran (Jira)
Prasanna Ravichandran created CARBONDATA-3807:
-

 Summary: Filter queries and projection queries with bloom columns 
are not hitting the bloom datamap.
 Key: CARBONDATA-3807
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3807
 Project: CarbonData
  Issue Type: Bug
 Environment: Ant cluster - opensource
Reporter: Prasanna Ravichandran
 Attachments: bloom-filtercolumn-plan.png, bloom-show index.png

Filter queries and projection queries with bloom columns are not hitting the 
bloom datamap.

 

Test queries: 

drop table if exists uniqdata;
CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, 
dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
bigint,decimal_column1 decimal(30,10), decimal_column2 
decimal(36,36),double_column1 double, double_column2 double,integer_column1 
int) stored as carbondata;
load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table 
uniqdata 
options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');

create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' 
PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1');

show indexes on uniqdata;

explain select count(*) from uniqdata where cust_name="CUST_NAME_0"; --not 
hitting;

explain select cust_name from uniqdata; --not hitting;

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-2920) For the Long string data, the local dictionary threshold is not reached even if the threshold condition is met

2018-09-07 Thread Prasanna Ravichandran (JIRA)
Prasanna Ravichandran created CARBONDATA-2920:
-

 Summary: For the Long string data, the local dictionary threshold 
is not reached even if the threshold condition is met
 Key: CARBONDATA-2920
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2920
 Project: CarbonData
  Issue Type: Bug
 Environment: 3 node cluster 
Reporter: Prasanna Ravichandran


For the Long string data, the local dictionary threshold is not reached even if 
the threshold condition is met.

【Test step】: 
1. Create table with long string column with local dictionary threshold as 1000.
2. Load more than 1000 distinct LONG data.
3. Check if the threshold is met.

*Test queries:*

drop table if exists 1klongdata;
create table 1klongdata(st string) stored by 'carbondata' 
TBLPROPERTIES('local_dictionary_enable'='true','local_dictionary_threshold'='1000','long_string_columns'='st');
load data inpath "hdfs://hacluster/user/prasanna/1005longdata.csv" into table 
1klongdata options('fileheader'='st');


【Expected Output】:Once the local dictionary threshold is crossed, it should 
display as "Local Dictionary threshold reached for the column: col_name, Unable 
to generate dictionary value. Dictionary threshold reached" in executor log. 
【Actual Output】:It is not printing the fallback details for long data even if 
the threshold limit is reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2892) Data mismatch is seen in the Array-String and Array-Timestamp.

2018-08-27 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2892:
--
Attachment: (was: Array.csv)

> Data mismatch is seen in the Array-String and Array-Timestamp.
> --
>
> Key: CARBONDATA-2892
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2892
> Project: CarbonData
>  Issue Type: Bug
> Environment: 3 Node ANT.
>Reporter: Prasanna Ravichandran
>Priority: Major
> Attachments: Array.csv
>
>
> Data mismatch is seen in the Array-String and Array-Timestamp like mismatch 
> in data, order, date values. 
> *Test queries:*
> drop table if exists array_com_hive;
> create table array_com_hive (CUST_ID string, YEAR int, MONTH int, AGE int, 
> GENDER string, EDUCATED string, IS_MARRIED string, ARRAY_INT 
> array,ARRAY_STRING array,ARRAY_DATE array,CARD_COUNT 
> int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) row 
> format delimited fields terminated by ',' collection items terminated by '$';
> load data local inpath '/opt/csv/complex/Array.csv' into table array_com_hive;
> drop table if exists array_com;
> create table Array_com (CUST_ID string, YEAR int, MONTH int, AGE int, GENDER 
> string, EDUCATED string, IS_MARRIED string, ARRAY_INT array,ARRAY_STRING 
> array,ARRAY_DATE array,CARD_COUNT int,DEBIT_COUNT int, 
> CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) using carbon;
> insert into Array_com select * from array_com_hive;
> select * from array_com_hive order by CUST_ID ASC limit 3;
> select * from array_com order by CUST_ID ASC limit 3;
> *Expected result:*
> There should be no data mismatch and data in table should be same as it is in 
> CSV file.
> *Actual result:*
> Data mismatch is seen.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2892) Data mismatch is seen in the Array-String and Array-Timestamp.

2018-08-27 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2892:
--
Attachment: Array.csv

> Data mismatch is seen in the Array-String and Array-Timestamp.
> --
>
> Key: CARBONDATA-2892
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2892
> Project: CarbonData
>  Issue Type: Bug
> Environment: 3 Node ANT.
>Reporter: Prasanna Ravichandran
>Priority: Major
> Attachments: Array.csv
>
>
> Data mismatch is seen in the Array-String and Array-Timestamp like mismatch 
> in data, order, date values. 
> *Test queries:*
> drop table if exists array_com_hive;
> create table array_com_hive (CUST_ID string, YEAR int, MONTH int, AGE int, 
> GENDER string, EDUCATED string, IS_MARRIED string, ARRAY_INT 
> array,ARRAY_STRING array,ARRAY_DATE array,CARD_COUNT 
> int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) row 
> format delimited fields terminated by ',' collection items terminated by '$';
> load data local inpath '/opt/csv/complex/Array.csv' into table array_com_hive;
> drop table if exists array_com;
> create table Array_com (CUST_ID string, YEAR int, MONTH int, AGE int, GENDER 
> string, EDUCATED string, IS_MARRIED string, ARRAY_INT array,ARRAY_STRING 
> array,ARRAY_DATE array,CARD_COUNT int,DEBIT_COUNT int, 
> CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) using carbon;
> insert into Array_com select * from array_com_hive;
> select * from array_com_hive order by CUST_ID ASC limit 3;
> select * from array_com order by CUST_ID ASC limit 3;
> *Expected result:*
> There should be no data mismatch and data in table should be same as it is in 
> CSV file.
> *Actual result:*
> Data mismatch is seen.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2893) Job aborted during insert while loading the "Struct of Array" datatype values.

2018-08-27 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2893:
--
Attachment: structofarray.csv

> Job aborted during insert while loading the "Struct of Array" datatype values.
> --
>
> Key: CARBONDATA-2893
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2893
> Project: CarbonData
>  Issue Type: Bug
> Environment: 3 Node ANT.
>Reporter: Prasanna Ravichandran
>Priority: Major
> Attachments: structofarray.csv
>
>
> Job aborted during insert while loading the "Struct of Array" datatype values.
> *Test queries:*
> 0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com_hive;
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.026 seconds)
>  0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com_hive (CUST_ID string, YEAR 
> int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, 
> STRUCT_OF_ARRAY struct,sal1: 
> array,state: array,date1: array>,CARD_COUNT 
> int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT float, HQ_DEPOSIT double) row 
> format delimited fields terminated by ',' collection items terminated by '$' 
> map keys terminated by '&';
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.159 seconds)
>  0: jdbc:hive2:> load data local inpath '/opt/csv/complex/structofarray.csv' 
> into table STRUCT_OF_ARRAY_com_hive;
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.217 seconds)
>  0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com;
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.03 seconds)
>  0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com (CUST_ID string, YEAR int, 
> MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, 
> STRUCT_OF_ARRAY struct,sal1: 
> array,state: array,date1: array>,CARD_COUNT 
> int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) 
> using carbon;
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.099 seconds)
>  0: jdbc:hive2:> insert into STRUCT_OF_ARRAY_com select * from 
> STRUCT_OF_ARRAY_com_hive;
>  *Error: org.apache.spark.SparkException: Job aborted. (state=,code=0)*
>  
>  *Expected result:*
> Insert should be success.
> *Actual result:*
> Insert is showing job aborted.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2893) Job aborted during insert while loading the "Struct of Array" datatype values.

2018-08-27 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2893:
--
Attachment: (was: structofarray.csv)

> Job aborted during insert while loading the "Struct of Array" datatype values.
> --
>
> Key: CARBONDATA-2893
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2893
> Project: CarbonData
>  Issue Type: Bug
> Environment: 3 Node ANT.
>Reporter: Prasanna Ravichandran
>Priority: Major
> Attachments: structofarray.csv
>
>
> Job aborted during insert while loading the "Struct of Array" datatype values.
> *Test queries:*
> 0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com_hive;
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.026 seconds)
>  0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com_hive (CUST_ID string, YEAR 
> int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, 
> STRUCT_OF_ARRAY struct,sal1: 
> array,state: array,date1: array>,CARD_COUNT 
> int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT float, HQ_DEPOSIT double) row 
> format delimited fields terminated by ',' collection items terminated by '$' 
> map keys terminated by '&';
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.159 seconds)
>  0: jdbc:hive2:> load data local inpath '/opt/csv/complex/structofarray.csv' 
> into table STRUCT_OF_ARRAY_com_hive;
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.217 seconds)
>  0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com;
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.03 seconds)
>  0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com (CUST_ID string, YEAR int, 
> MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, 
> STRUCT_OF_ARRAY struct,sal1: 
> array,state: array,date1: array>,CARD_COUNT 
> int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) 
> using carbon;
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.099 seconds)
>  0: jdbc:hive2:> insert into STRUCT_OF_ARRAY_com select * from 
> STRUCT_OF_ARRAY_com_hive;
>  *Error: org.apache.spark.SparkException: Job aborted. (state=,code=0)*
>  
>  *Expected result:*
> Insert should be success.
> *Actual result:*
> Insert is showing job aborted.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2893) Job aborted during insert while loading the "Struct of Array" datatype values.

2018-08-27 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2893:
--
Attachment: structofarray.csv

> Job aborted during insert while loading the "Struct of Array" datatype values.
> --
>
> Key: CARBONDATA-2893
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2893
> Project: CarbonData
>  Issue Type: Bug
> Environment: 3 Node ANT.
>Reporter: Prasanna Ravichandran
>Priority: Major
> Attachments: structofarray.csv
>
>
> Job aborted during insert while loading the "Struct of Array" datatype values.
> *Test queries:*
> 0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com_hive;
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.026 seconds)
>  0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com_hive (CUST_ID string, YEAR 
> int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, 
> STRUCT_OF_ARRAY struct,sal1: 
> array,state: array,date1: array>,CARD_COUNT 
> int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT float, HQ_DEPOSIT double) row 
> format delimited fields terminated by ',' collection items terminated by '$' 
> map keys terminated by '&';
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.159 seconds)
>  0: jdbc:hive2:> load data local inpath '/opt/csv/complex/structofarray.csv' 
> into table STRUCT_OF_ARRAY_com_hive;
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.217 seconds)
>  0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com;
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.03 seconds)
>  0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com (CUST_ID string, YEAR int, 
> MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, 
> STRUCT_OF_ARRAY struct,sal1: 
> array,state: array,date1: array>,CARD_COUNT 
> int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) 
> using carbon;
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.099 seconds)
>  0: jdbc:hive2:> insert into STRUCT_OF_ARRAY_com select * from 
> STRUCT_OF_ARRAY_com_hive;
>  *Error: org.apache.spark.SparkException: Job aborted. (state=,code=0)*
>  
>  *Expected result:*
> Insert should be success.
> *Actual result:*
> Insert is showing job aborted.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2893) Job aborted during insert while loading the "Struct of Array" datatype values.

2018-08-27 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2893:
--
Attachment: (was: arrayofstruct.csv)

> Job aborted during insert while loading the "Struct of Array" datatype values.
> --
>
> Key: CARBONDATA-2893
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2893
> Project: CarbonData
>  Issue Type: Bug
> Environment: 3 Node ANT.
>Reporter: Prasanna Ravichandran
>Priority: Major
> Attachments: structofarray.csv
>
>
> Job aborted during insert while loading the "Struct of Array" datatype values.
> *Test queries:*
> 0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com_hive;
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.026 seconds)
>  0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com_hive (CUST_ID string, YEAR 
> int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, 
> STRUCT_OF_ARRAY struct,sal1: 
> array,state: array,date1: array>,CARD_COUNT 
> int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT float, HQ_DEPOSIT double) row 
> format delimited fields terminated by ',' collection items terminated by '$' 
> map keys terminated by '&';
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.159 seconds)
>  0: jdbc:hive2:> load data local inpath '/opt/csv/complex/structofarray.csv' 
> into table STRUCT_OF_ARRAY_com_hive;
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.217 seconds)
>  0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com;
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.03 seconds)
>  0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com (CUST_ID string, YEAR int, 
> MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, 
> STRUCT_OF_ARRAY struct,sal1: 
> array,state: array,date1: array>,CARD_COUNT 
> int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) 
> using carbon;
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.099 seconds)
>  0: jdbc:hive2:> insert into STRUCT_OF_ARRAY_com select * from 
> STRUCT_OF_ARRAY_com_hive;
>  *Error: org.apache.spark.SparkException: Job aborted. (state=,code=0)*
>  
>  *Expected result:*
> Insert should be success.
> *Actual result:*
> Insert is showing job aborted.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2893) Job aborted during insert while loading the "Struct of Array" datatype values.

2018-08-27 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2893:
--
Description: 
Job aborted during insert while loading the "Struct of Array" datatype values.

*Test queries:*

0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com_hive;
 +--+-+
|Result|

+--+-+
 +--+-+
 No rows selected (0.026 seconds)
 0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com_hive (CUST_ID string, YEAR 
int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, 
STRUCT_OF_ARRAY struct,sal1: 
array,state: array,date1: array>,CARD_COUNT 
int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT float, HQ_DEPOSIT double) row 
format delimited fields terminated by ',' collection items terminated by '$' 
map keys terminated by '&';
 +--+-+
|Result|

+--+-+
 +--+-+
 No rows selected (0.159 seconds)
 0: jdbc:hive2:> load data local inpath '/opt/csv/complex/structofarray.csv' 
into table STRUCT_OF_ARRAY_com_hive;
 +--+-+
|Result|

+--+-+
 +--+-+
 No rows selected (0.217 seconds)
 0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com;
 +--+-+
|Result|

+--+-+
 +--+-+
 No rows selected (0.03 seconds)
 0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com (CUST_ID string, YEAR int, 
MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, 
STRUCT_OF_ARRAY struct,sal1: 
array,state: array,date1: array>,CARD_COUNT 
int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) using 
carbon;
 +--+-+
|Result|

+--+-+
 +--+-+
 No rows selected (0.099 seconds)
 0: jdbc:hive2:> insert into STRUCT_OF_ARRAY_com select * from 
STRUCT_OF_ARRAY_com_hive;
 *Error: org.apache.spark.SparkException: Job aborted. (state=,code=0)*

 

 *Expected result:*

Insert should be success.

*Actual result:*

Insert is showing job aborted.

 

  was:
Job aborted during insert while loading the "Struct of Array" datatype values.

*Test queries:*

0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com_hive;
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (0.026 seconds)
0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com_hive (CUST_ID string, YEAR 
int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, 
STRUCT_OF_ARRAY struct,sal1: 
array,state: array,date1: array>,CARD_COUNT 
int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT float, HQ_DEPOSIT double) row 
format delimited fields terminated by ',' collection items terminated by '$' 
map keys terminated by '&';
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (0.159 seconds)
0: jdbc:hive2:> load data local inpath '/opt/csv/complex/structofarray.csv' 
into table STRUCT_OF_ARRAY_com_hive;
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (0.217 seconds)
0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com;
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (0.03 seconds)
0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com (CUST_ID string, YEAR int, 
MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, 
STRUCT_OF_ARRAY struct,sal1: 
array,state: array,date1: array>,CARD_COUNT 
int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) using 
carbon;
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (0.099 seconds)
0: jdbc:hive2:> insert into STRUCT_OF_ARRAY_com select * from 
STRUCT_OF_ARRAY_com_hive;
*Error: org.apache.spark.SparkException: Job aborted. (state=,code=0)*

 

 


> Job aborted during insert while loading the "Struct of Array" datatype values.
> --
>
> Key: CARBONDATA-2893
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2893
> Project: CarbonData
>  Issue Type: Bug
> Environment: 3 Node ANT.
>Reporter: Prasanna Ravichandran
>Priority: Major
> Attachments: arrayofstruct.csv
>
>
> Job aborted during insert while loading the "Struct of Array" datatype values.
> *Test queries:*
> 0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com_hive;
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.026 seconds)
>  0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com_hive (CUST_ID string, YEAR 
> int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, 
> STRUCT_OF_ARRAY struct,sal1: 
> array,state: array,date1: array>,CARD_COUNT 
> int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT float, HQ_DEPOSIT double) row 
> format delimited fields terminated by ',' collection items terminated by '$' 
> map keys terminated by '&';
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.159 seconds)
>  0: 

[jira] [Created] (CARBONDATA-2893) Job aborted during insert while loading the "Struct of Array" datatype values.

2018-08-27 Thread Prasanna Ravichandran (JIRA)
Prasanna Ravichandran created CARBONDATA-2893:
-

 Summary: Job aborted during insert while loading the "Struct of 
Array" datatype values.
 Key: CARBONDATA-2893
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2893
 Project: CarbonData
  Issue Type: Bug
 Environment: 3 Node ANT.
Reporter: Prasanna Ravichandran
 Attachments: arrayofstruct.csv

Job aborted during insert while loading the "Struct of Array" datatype values.

*Test queries:*

0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com_hive;
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (0.026 seconds)
0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com_hive (CUST_ID string, YEAR 
int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, 
STRUCT_OF_ARRAY struct,sal1: 
array,state: array,date1: array>,CARD_COUNT 
int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT float, HQ_DEPOSIT double) row 
format delimited fields terminated by ',' collection items terminated by '$' 
map keys terminated by '&';
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (0.159 seconds)
0: jdbc:hive2:> load data local inpath '/opt/csv/complex/structofarray.csv' 
into table STRUCT_OF_ARRAY_com_hive;
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (0.217 seconds)
0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com;
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (0.03 seconds)
0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com (CUST_ID string, YEAR int, 
MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, 
STRUCT_OF_ARRAY struct,sal1: 
array,state: array,date1: array>,CARD_COUNT 
int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) using 
carbon;
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (0.099 seconds)
0: jdbc:hive2:> insert into STRUCT_OF_ARRAY_com select * from 
STRUCT_OF_ARRAY_com_hive;
*Error: org.apache.spark.SparkException: Job aborted. (state=,code=0)*

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2892) Data mismatch is seen in the Array-String and Array-Timestamp.

2018-08-27 Thread Prasanna Ravichandran (JIRA)
Prasanna Ravichandran created CARBONDATA-2892:
-

 Summary: Data mismatch is seen in the Array-String and 
Array-Timestamp.
 Key: CARBONDATA-2892
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2892
 Project: CarbonData
  Issue Type: Bug
 Environment: 3 Node ANT.
Reporter: Prasanna Ravichandran
 Attachments: Array.csv

Data mismatch is seen in the Array-String and Array-Timestamp like mismatch in 
data, order, date values. 

*Test queries:*

drop table if exists array_com_hive;
create table array_com_hive (CUST_ID string, YEAR int, MONTH int, AGE int, 
GENDER string, EDUCATED string, IS_MARRIED string, ARRAY_INT 
array,ARRAY_STRING array,ARRAY_DATE array,CARD_COUNT 
int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) row 
format delimited fields terminated by ',' collection items terminated by '$';
load data local inpath '/opt/csv/complex/Array.csv' into table array_com_hive;
drop table if exists array_com;
create table Array_com (CUST_ID string, YEAR int, MONTH int, AGE int, GENDER 
string, EDUCATED string, IS_MARRIED string, ARRAY_INT array,ARRAY_STRING 
array,ARRAY_DATE array,CARD_COUNT int,DEBIT_COUNT int, 
CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) using carbon;
insert into Array_com select * from array_com_hive;
select * from array_com_hive order by CUST_ID ASC limit 3;
select * from array_com order by CUST_ID ASC limit 3;

*Expected result:*

There should be no data mismatch and data in table should be same as it is in 
CSV file.

*Actual result:*

Data mismatch is seen.

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (CARBONDATA-2822) Carbon Configuration - "carbon.invisible.segments.preserve.count" configuration property is not working as expected.

2018-08-07 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran closed CARBONDATA-2822.
-

> Carbon Configuration - "carbon.invisible.segments.preserve.count"  
> configuration property is not working as expected.
> -
>
> Key: CARBONDATA-2822
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2822
> Project: CarbonData
>  Issue Type: Bug
>  Components: core, file-format
> Environment: 3 Node ANT cluster.
>Reporter: Prasanna Ravichandran
>Priority: Minor
> Attachments: configuration.png
>
>
> For the *carbon.invisible.segments.preserve.count* configuration, it is not 
> working as expected.
> +*Steps to reproduce:*+
> 1) Setting up "*carbon.invisible.segments.preserve.count=20"* in 
> carbon.properties and restarting the thrift server.
>  
> 2) After performing Loading 40 times and Compaction 4 times.
> 3) Perform clean files, so that the tablestatus.history file would be 
> generated with invisible segments details.
>  So Total 44 segments would be created including visible and invisible 
> segments.(40 load segment (like segment ID from 0,1,2...39) + 4 compacted new 
> segment(like 0.1,20.1,22.1,0.2))
> In that, *41 segments information are present in the "tablestatus.history" 
> file(*which holds invisible(marked for delete and compacted) segments 
> details) and 3 segments information are present in the "tablestatus" 
> file(which holds visible segments(0 .2 -final compacted segment) along with 
> (1^st^ segment - 0th segment) and (last segment-39th segment)). *But 
> invisible segment preserve count is configured to 20, which is not followed 
> for the tablestatus.history file.*
> +*Expected result:*+
> tablestatus.history file should preserve only the latest 20 segments, as per 
> the configuration.
> +*Actual result:*+
> tablestatus.history file is having 41 invisible segments details.(which is 
> above the configured value: 20)
>  
> This is tested with ANT cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2822) Carbon Configuration - "carbon.invisible.segments.preserve.count" configuration property is not working as expected.

2018-08-07 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran resolved CARBONDATA-2822.
---
Resolution: Invalid

Working fine.

> Carbon Configuration - "carbon.invisible.segments.preserve.count"  
> configuration property is not working as expected.
> -
>
> Key: CARBONDATA-2822
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2822
> Project: CarbonData
>  Issue Type: Bug
>  Components: core, file-format
> Environment: 3 Node ANT cluster.
>Reporter: Prasanna Ravichandran
>Priority: Minor
> Attachments: configuration.png
>
>
> For the *carbon.invisible.segments.preserve.count* configuration, it is not 
> working as expected.
> +*Steps to reproduce:*+
> 1) Setting up "*carbon.invisible.segments.preserve.count=20"* in 
> carbon.properties and restarting the thrift server.
>  
> 2) After performing Loading 40 times and Compaction 4 times.
> 3) Perform clean files, so that the tablestatus.history file would be 
> generated with invisible segments details.
>  So Total 44 segments would be created including visible and invisible 
> segments.(40 load segment (like segment ID from 0,1,2...39) + 4 compacted new 
> segment(like 0.1,20.1,22.1,0.2))
> In that, *41 segments information are present in the "tablestatus.history" 
> file(*which holds invisible(marked for delete and compacted) segments 
> details) and 3 segments information are present in the "tablestatus" 
> file(which holds visible segments(0 .2 -final compacted segment) along with 
> (1^st^ segment - 0th segment) and (last segment-39th segment)). *But 
> invisible segment preserve count is configured to 20, which is not followed 
> for the tablestatus.history file.*
> +*Expected result:*+
> tablestatus.history file should preserve only the latest 20 segments, as per 
> the configuration.
> +*Actual result:*+
> tablestatus.history file is having 41 invisible segments details.(which is 
> above the configured value: 20)
>  
> This is tested with ANT cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CARBONDATA-2822) Carbon Configuration - "carbon.invisible.segments.preserve.count" configuration property is not working as expected.

2018-08-07 Thread Prasanna Ravichandran (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571437#comment-16571437
 ] 

Prasanna Ravichandran commented on CARBONDATA-2822:
---

The property *"carbon.invisible.segments.preserve.count"* is actually for 
TableStatusFile only. When we set this property, if the number of invisible 
segments in tablestatus file exceeds that configured 
*carbon.invisible.segments.preserve.count value,* then it is moving all the 
invisible segments to the tablestatus.history file. It is working fine as 
expected.

 

> Carbon Configuration - "carbon.invisible.segments.preserve.count"  
> configuration property is not working as expected.
> -
>
> Key: CARBONDATA-2822
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2822
> Project: CarbonData
>  Issue Type: Bug
>  Components: core, file-format
> Environment: 3 Node ANT cluster.
>Reporter: Prasanna Ravichandran
>Priority: Minor
> Attachments: configuration.png
>
>
> For the *carbon.invisible.segments.preserve.count* configuration, it is not 
> working as expected.
> +*Steps to reproduce:*+
> 1) Setting up "*carbon.invisible.segments.preserve.count=20"* in 
> carbon.properties and restarting the thrift server.
>  
> 2) After performing Loading 40 times and Compaction 4 times.
> 3) Perform clean files, so that the tablestatus.history file would be 
> generated with invisible segments details.
>  So Total 44 segments would be created including visible and invisible 
> segments.(40 load segment (like segment ID from 0,1,2...39) + 4 compacted new 
> segment(like 0.1,20.1,22.1,0.2))
> In that, *41 segments information are present in the "tablestatus.history" 
> file(*which holds invisible(marked for delete and compacted) segments 
> details) and 3 segments information are present in the "tablestatus" 
> file(which holds visible segments(0 .2 -final compacted segment) along with 
> (1^st^ segment - 0th segment) and (last segment-39th segment)). *But 
> invisible segment preserve count is configured to 20, which is not followed 
> for the tablestatus.history file.*
> +*Expected result:*+
> tablestatus.history file should preserve only the latest 20 segments, as per 
> the configuration.
> +*Actual result:*+
> tablestatus.history file is having 41 invisible segments details.(which is 
> above the configured value: 20)
>  
> This is tested with ANT cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2816) MV Datamap - With the hive metastore disabled, MV is not working as expected.

2018-08-03 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2816:
--
Description: 
When the hive metastore is disabled(spark.carbon.hive.schema.store=false), then 
the below issues are seen.

CARBONDATA-2534

CARBONDATA-2539

CARBONDATA-2576

 

  was:
When the hive metastore is disabled(spark.carbon.hive.schema.store=false), then 
the below issues are seen.

CARBONDATA-2540

CARBONDATA-2539

CARBONDATA-2576

 


> MV Datamap - With the hive metastore disabled, MV is not working as expected.
> -
>
> Key: CARBONDATA-2816
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2816
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Reporter: Prasanna Ravichandran
>Priority: Minor
>  Labels: MV
>
> When the hive metastore is disabled(spark.carbon.hive.schema.store=false), 
> then the below issues are seen.
> CARBONDATA-2534
> CARBONDATA-2539
> CARBONDATA-2576
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2816) MV Datamap - With the hive metastore disabled, MV is not working as expected.

2018-08-02 Thread Prasanna Ravichandran (JIRA)
Prasanna Ravichandran created CARBONDATA-2816:
-

 Summary: MV Datamap - With the hive metastore disabled, MV is not 
working as expected.
 Key: CARBONDATA-2816
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2816
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Reporter: Prasanna Ravichandran


When the hive metastore is disabled(spark.carbon.hive.schema.store=false), then 
the below issues are seen.

CARBONDATA-2540

CARBONDATA-2539

CARBONDATA-2576

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2576) MV Datamap - MV is not working fine if there is more than 3 aggregate function in the same datamap.

2018-08-01 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2576:
--
Description: 
MV is not working fine if there is more than 3 aggregate function in the same 
datamap. It is working fine upto 3 aggregate functions on the same MV. Please 
see the attached document for more details.

Test queries:

 

scala> carbon.sql("create datamap datamap_comp_maxsumminavg using 'mv' as 
select 
empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) from 
originTable group by empno").show(200,false)

++
  
 ++

++

 

 rebuild data

scala> carbon.sql("rebuild datamap datamap_comp_maxsumminavg").show(200,false)

++
  
 ++

++

 

 

scala> carbon.sql("explain select 
empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) from 
originTable group by empno").show(200,false)

org.apache.spark.sql.AnalysisException: expression 
'datamap_comp_maxsumminavg_table.`avg_attendance`' is neither present in the 
group by, nor is it an aggregate function. Add to group by or wrap in first() 
(or first_value) if you don't care which value you get.;;

Aggregate [origintable_empno#2925|#2925], [origintable_empno#2925 AS 
empno#3002, max(max_projectenddate#2926) AS max(projectenddate)#3003, 
sum(sum_salary#2927L) AS sum(salary)#3004L, min(min_projectjoindate#2928) AS 
min(projectjoindate)#3005, avg_attendance#2929 AS avg(attendance)#3006|#2925 AS 
empno#3002, max(max_projectenddate#2926) AS max(projectenddate)#3003, 
sum(sum_salary#2927L) AS sum(salary)#3004L, min(min_projectjoindate#2928) AS 
min(projectjoindate)#3005, avg_attendance#2929 AS avg(attendance)#3006]

+- SubqueryAlias datamap_comp_maxsumminavg_table

   +- 
Relation[origintable_empno#2925,max_projectenddate#2926,sum_salary#2927L,min_projectjoindate#2928,avg_attendance#2929|#2925,max_projectenddate#2926,sum_salary#2927L,min_projectjoindate#2928,avg_attendance#2929]
 CarbonDatasourceHadoopRelation [ Database name :default, Table name 
:datamap_comp_maxsumminavg_table, Schema 
:Some(StructType(StructField(origintable_empno,IntegerType,true), 
StructField(max_projectenddate,TimestampType,true), 
StructField(sum_salary,LongType,true), 
StructField(min_projectjoindate,TimestampType,true), 
StructField(avg_attendance,DoubleType,true))) ]

 

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:39)

  at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:247)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253)

  at scala.collection.immutable.List.foreach(List.scala:381)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:253)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280)

  at scala.collection.immutable.List.foreach(List.scala:381)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:280)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:78)

  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:78)

  at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91)

  at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:52)

  at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:148)

  at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:95)

  at 
org.apache.carbondata.mv.datamap.MVAnalyzerRule.apply(MVAnalyzerRule.scala:72)

  at 
org.apache.carbondata.mv.datamap.MVAnalyzerRule.apply(MVAnalyzerRule.scala:38)

  at org.apache.spark.sql.hive.CarbonAnalyzer.execute(CarbonAnalyzer.scala:46)

  at org.apache.spark.sql.hive.CarbonAnalyzer.execute(CarbonAnalyzer.scala:27)

  at 

[jira] [Commented] (CARBONDATA-2534) MV Dataset - MV creation is not working with the substring()

2018-08-01 Thread Prasanna Ravichandran (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565167#comment-16565167
 ] 

Prasanna Ravichandran commented on CARBONDATA-2534:
---

When the user executes the MV datamap query, it should be accessed from 
MV_Table.

> MV Dataset - MV creation is not working with the substring() 
> -
>
> Key: CARBONDATA-2534
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2534
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
> Environment: 3 node opensource ANT cluster
>Reporter: Prasanna Ravichandran
>Priority: Minor
>  Labels: CarbonData, MV, Materialistic_Views
> Fix For: 1.5.0, 1.4.1
>
> Attachments: MV_substring.docx, data.csv
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> MV creation is not working with the sub string function. We are getting the 
> spark.sql.AnalysisException while trying to create a MV with the substring 
> and aggregate function. 
> *Spark -shell test queries:*
>  scala> carbon.sql("create datamap mv_substr using 'mv' as select 
> sum(salary),substring(empname,2,5),designation from originTable group by 
> substring(empname,2,5),designation").show(200,false)
> *org.apache.spark.sql.AnalysisException: Cannot create a table having a 
> column whose name contains commas in Hive metastore. Table: 
> `default`.`mv_substr_table`; Column: substring_empname,_2,_5;*
>  *at* 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:150)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:148)
>  at scala.collection.immutable.List.foreach(List.scala:381)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema(HiveExternalCatalog.scala:148)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply$mcV$sp(HiveExternalCatalog.scala:222)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.doCreateTable(HiveExternalCatalog.scala:216)
>  at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalog.createTable(ExternalCatalog.scala:110)
>  at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:316)
>  at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:119)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
>  at org.apache.spark.sql.Dataset.(Dataset.scala:183)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:108)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:97)
>  at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:155)
>  at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:95)
>  at 
> org.apache.spark.sql.execution.command.table.CarbonCreateTableCommand.processMetadata(CarbonCreateTableCommand.scala:126)
>  at 
> org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:68)
>  at 
> org.apache.carbondata.mv.datamap.MVHelper$.createMVDataMap(MVHelper.scala:103)
>  at 
> org.apache.carbondata.mv.datamap.MVDataMapProvider.initMeta(MVDataMapProvider.scala:53)
>  at 
> org.apache.spark.sql.execution.command.datamap.CarbonCreateDataMapCommand.processMetadata(CarbonCreateDataMapCommand.scala:118)
>  at 
> org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:90)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
>  at org.apache.spark.sql.Dataset.(Dataset.scala:183)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:108)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:97)
>  at 

[jira] [Closed] (CARBONDATA-2528) MV Datamap - When the MV is created with the order by, then when we execute the corresponding query defined in MV with order by, then the data is not accessed from th

2018-08-01 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran closed CARBONDATA-2528.
-

Closed.

> MV Datamap - When the MV is created with the order by, then when we execute 
> the corresponding query defined in MV with order by, then the data is not 
> accessed from the MV. 
> 
>
> Key: CARBONDATA-2528
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2528
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
> Environment: 3 node Opensource ANT cluster. (Opensource Hadoop 2.7.2+ 
> Opensource Spark 2.2.1+ Opensource Carbondata 1.3.1)
>Reporter: Prasanna Ravichandran
>Assignee: Ravindra Pesala
>Priority: Minor
>  Labels: CarbonData, MV, Materialistic_Views
> Fix For: 1.5.0, 1.4.1
>
> Attachments: MV_orderby.docx, data.csv
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> When the MV is created with the order by condition, then when we execute the 
> corresponding query defined in MV along with order by, then the data is not 
> accessed from the MV. The data is being accessed from the maintable only. 
> Test queries:
> create datamap MV_order using 'mv' as select 
> empno,sum(salary)+sum(utilization) as total from originTable group by empno 
> order by empno;
> create datamap MV_desc_order using 'mv' as select 
> empno,sum(salary)+sum(utilization) as total from originTable group by empno 
> order by empno DESC;
> rebuild datamap MV_order;
> rebuild datamap MV_desc_order;
> explain select empno,sum(salary)+sum(utilization) as total from originTable 
> group by empno order by empno;
> explain select empno,sum(salary)+sum(utilization) as total from originTable 
> group by empno order by empno DESC;
> Expected result: MV with order by condition should access data from the MV 
> table only.
>  
> Please see the attached document for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (CARBONDATA-2540) MV Dataset - Unionall queries are not fetching data from MV dataset.

2018-08-01 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran closed CARBONDATA-2540.
-

> MV Dataset - Unionall queries are not fetching data from MV dataset.
> 
>
> Key: CARBONDATA-2540
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2540
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Reporter: Prasanna Ravichandran
>Assignee: Ravindra Pesala
>Priority: Minor
>  Labels: Carbondata, MV, Materialistic_Views
> Fix For: 1.5.0, 1.4.1
>
> Attachments: data_mv.csv
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Unionall queries are not fetching data from MV dataset. 
> Test queries:
> scala> carbon.sql("drop table if exists fact_table1").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("CREATE TABLE fact_table1 (empno int, empname String, 
> designation String, doj Timestamp,workgroupcategory int, 
> workgroupcategoryname String, deptno int, deptname String,projectcode int, 
> projectjoindate Timestamp, projectenddate Timestamp,attendance 
> int,utilization int,salary int)STORED BY 
> 'org.apache.carbondata.format'").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("LOAD DATA local inpath 
> 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO TABLE fact_table1 
> OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '"','timestampformat'='dd-MM-')").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("LOAD DATA local inpath 
> 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO TABLE fact_table1 
> OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '\"','timestampformat'='dd-MM-')").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("drop table if exists fact_table2").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("CREATE TABLE fact_table2 (empno int, empname String, 
> designation String, doj Timestamp,workgroupcategory int, 
> workgroupcategoryname String, deptno int, deptname String,projectcode int, 
> projectjoindate Timestamp, projectenddate Timestamp,attendance 
> int,utilization int,salary int)STORED BY 
> 'org.apache.carbondata.format'").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("LOAD DATA local inpath 
> 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO TABLE fact_table2 
> OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '\"','timestampformat'='dd-MM-')").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("LOAD DATA local inpath 
> 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO TABLE fact_table2 
> OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '\"','timestampformat'='dd-MM-')").show(200,false)
> ++
> ||
> ++
> ++
>  
> scala> carbon.sql("create datamap mv_unional using 'mv' as Select Z.empno 
> From (Select empno,empname From fact_table1 Union All Select empno,empname 
> from fact_table2) As Z Group By Z.empno").show(200,false)
> ++
> ||
> ++
> ++
>  
> scala> carbon.sql("rebuild datamap mv_unional").show()
> ++
> ||
> ++
> ++
> scala> carbon.sql("explain Select Z.empno From (Select empno,empname From 
> fact_table1 Union All Select empno,empname from fact_table2) As Z Group By 
> Z.empno").show(200,false)
> 

[jira] [Commented] (CARBONDATA-2540) MV Dataset - Unionall queries are not fetching data from MV dataset.

2018-08-01 Thread Prasanna Ravichandran (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565129#comment-16565129
 ] 

Prasanna Ravichandran commented on CARBONDATA-2540:
---

Validation added. Closed.

*Terminal:*

> create datamap mv_unional using 'mv' as Select Z.empno From (Select 
> empno,empname From fact_table1 Union All Select empno,empname from 
> fact_table2) As Z Group By Z.empno;
*Error: java.lang.UnsupportedOperationException: MV is not supported for this 
query (state=,code=0)*

> MV Dataset - Unionall queries are not fetching data from MV dataset.
> 
>
> Key: CARBONDATA-2540
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2540
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Reporter: Prasanna Ravichandran
>Assignee: Ravindra Pesala
>Priority: Minor
>  Labels: Carbondata, MV, Materialistic_Views
> Fix For: 1.5.0, 1.4.1
>
> Attachments: data_mv.csv
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Unionall queries are not fetching data from MV dataset. 
> Test queries:
> scala> carbon.sql("drop table if exists fact_table1").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("CREATE TABLE fact_table1 (empno int, empname String, 
> designation String, doj Timestamp,workgroupcategory int, 
> workgroupcategoryname String, deptno int, deptname String,projectcode int, 
> projectjoindate Timestamp, projectenddate Timestamp,attendance 
> int,utilization int,salary int)STORED BY 
> 'org.apache.carbondata.format'").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("LOAD DATA local inpath 
> 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO TABLE fact_table1 
> OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '"','timestampformat'='dd-MM-')").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("LOAD DATA local inpath 
> 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO TABLE fact_table1 
> OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '\"','timestampformat'='dd-MM-')").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("drop table if exists fact_table2").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("CREATE TABLE fact_table2 (empno int, empname String, 
> designation String, doj Timestamp,workgroupcategory int, 
> workgroupcategoryname String, deptno int, deptname String,projectcode int, 
> projectjoindate Timestamp, projectenddate Timestamp,attendance 
> int,utilization int,salary int)STORED BY 
> 'org.apache.carbondata.format'").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("LOAD DATA local inpath 
> 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO TABLE fact_table2 
> OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '\"','timestampformat'='dd-MM-')").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("LOAD DATA local inpath 
> 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO TABLE fact_table2 
> OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '\"','timestampformat'='dd-MM-')").show(200,false)
> ++
> ||
> ++
> ++
>  
> scala> carbon.sql("create datamap mv_unional using 'mv' as Select Z.empno 
> From (Select empno,empname From fact_table1 Union All Select empno,empname 
> from fact_table2) As Z Group By Z.empno").show(200,false)
> ++
> ||
> ++
> ++
>  
> scala> carbon.sql("rebuild datamap mv_unional").show()
> ++
> ||
> ++
> ++
> scala> carbon.sql("explain Select Z.empno From (Select empno,empname From 
> fact_table1 Union All Select empno,empname from fact_table2) As Z Group By 
> Z.empno").show(200,false)
> 

[jira] [Commented] (CARBONDATA-2539) MV Dataset - Subqueries is not accessing the data from the MV datamap.

2018-08-01 Thread Prasanna Ravichandran (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565103#comment-16565103
 ] 

Prasanna Ravichandran commented on CARBONDATA-2539:
---

Still the sub-queries are not accessing the data from the MV datamap.

Terminal:

> create datamap dm3 using 'mv' as *select min(workgroupcategory) from 
> origintable*;
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (0.392 seconds)

> select distinct workgroupcategory from originTable;
++--+
| workgroupcategory |
++--+
| 1 |
| 3 |
| 2 |
++--+
3 rows selected (0.664 seconds)

> select count(*) from originTable where workgroupcategory=1;
+---+--+
| count(1) |
+---+--+
| 5 |
+---+--+
1 row selected (0.349 seconds)

> explain SELECT max(empno) FROM originTable WHERE workgroupcategory IN 
>(*select min(workgroupcategory) from originTable*) group by empname;
+-+--+
| plan |
+-+--+
| == CarbonData Profiler ==
Table Scan on origintable
 - total blocklets: 1
 - filter: none
 - pruned by Main DataMap
 - skipped blocklets: 0
Table Scan on origintable
 - total blocklets: 1
 - filter: none
 - pruned by Main DataMap
 - skipped blocklets: 0
 |
| == Physical Plan ==
*HashAggregate(keys=[empname#24982], functions=[max(empno#24981)])
+- Exchange hashpartitioning(empname#24982, 200)
 +- *HashAggregate(keys=[empname#24982], functions=[partial_max(empno#24981)])
 +- *Project [empno#24981, empname#24982]
 +- *BroadcastHashJoin [workgroupcategory#24985], 
[*min(workgroupcategory)*#25804], LeftSemi, BuildRight
 :- *FileScan carbondata 
*rtyo.origintable*[empno#24981,empname#24982,designation#24983,doj#24984,workgroupcategory#24985,workgroupcategoryname#24986,deptno#24987,deptname#24988,projectcode#24989,projectjoindate#24990,projectenddate#24991,attendance#24992,utilization#24993,salary#24994]
 +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] 
as bigint)))
 +- *HashAggregate(keys=[], functions=[min(workgroupcategory#24985)])
 +- Exchange SinglePartition
 +- *HashAggregate(keys=[], functions=[partial_min(workgroupcategory#24985)])
 +- *FileScan carbondata *rtyo.origintable*[workgroupcategory#24985] |

[jira] [Commented] (CARBONDATA-2534) MV Dataset - MV creation is not working with the substring()

2018-08-01 Thread Prasanna Ravichandran (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565016#comment-16565016
 ] 

Prasanna Ravichandran commented on CARBONDATA-2534:
---

Now the MV creation is working with the substring function without any error 
but when the user queries the MV query, it is not accessing the data from the 
MV datamap.

*Terminal:*

> create datamap mv_substr using 'mv' as select 
> sum(salary),substring(empname,2,5),designation from originTable group by 
> substring(empname,2,5),designation;
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (0.661 seconds)

> explain select sum(salary),substring(empname,2,5),designation from 
> originTable group by substring(empname,2,5),designation;
+--+--+
| plan |
+--+--+
| == CarbonData Profiler ==
Table Scan on origintable
 - total blocklets: 2
 - filter: none
 - pruned by Main DataMap
 - skipped blocklets: 0
 |
| == Physical Plan ==
*HashAggregate(keys=[substring(empname#18267, 2, 5)#18352, designation#18268], 
functions=[sum(cast(salary#18279 as bigint))])
+- Exchange hashpartitioning(substring(empname#18267, 2, 5)#18352, 
designation#18268, 200)
 +- *HashAggregate(keys=[substring(empname#18267, 2, 5) AS 
substring(empname#18267, 2, 5)#18352, designation#18268], 
functions=[partial_sum(cast(salary#18279 as bigint))])
 +- *FileScan carbondata 
*b011.origintable*[empname#18267,designation#18268,salary#18279] |
+--+--+
2 rows selected (0.432 seconds)

> MV Dataset - MV creation is not working with the substring() 
> -
>
> Key: CARBONDATA-2534
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2534
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
> Environment: 3 node opensource ANT cluster
>Reporter: Prasanna Ravichandran
>Priority: Minor
>  Labels: CarbonData, MV, Materialistic_Views
> Fix For: 1.5.0, 1.4.1
>
> Attachments: MV_substring.docx, data.csv
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> MV creation is not working with the sub string function. We are getting the 
> spark.sql.AnalysisException while trying to create a MV with the substring 
> and aggregate function. 
> *Spark -shell test queries:*
>  scala> carbon.sql("create datamap mv_substr using 'mv' as select 
> sum(salary),substring(empname,2,5),designation from originTable group by 
> substring(empname,2,5),designation").show(200,false)
> *org.apache.spark.sql.AnalysisException: Cannot create a table having a 
> column whose name contains commas in Hive metastore. Table: 
> `default`.`mv_substr_table`; Column: substring_empname,_2,_5;*
>  *at* 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:150)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:148)
>  at scala.collection.immutable.List.foreach(List.scala:381)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema(HiveExternalCatalog.scala:148)
>  at 
> 

[jira] [Commented] (CARBONDATA-2528) MV Datamap - When the MV is created with the order by, then when we execute the corresponding query defined in MV with order by, then the data is not accessed from

2018-08-01 Thread Prasanna Ravichandran (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565003#comment-16565003
 ] 

Prasanna Ravichandran commented on CARBONDATA-2528:
---

Now the data is fetching from the MV datamap for the order by queries. Working 
fine.

 explain select attendance,sum(salary)+sum(utilization) as total from 
originTable group by attendance order by attendance DESC;
++--+
| plan |
++--+
| == CarbonData Profiler ==
Table Scan on mv_desc_attendance_table
 - total blocklets: 4
 - filter: none
 - pruned by Main DataMap
 - skipped blocklets: 0
 |
| == Physical Plan ==
*Sort [attendance#12952 DESC NULLS LAST], true, 0
+- Exchange rangepartitioning(attendance#12952 DESC NULLS LAST, 200)
 +- *Project [origintable_attendance#12897 AS attendance#12952, total#12898L]
 +- *FileScan carbondata 
b011.*mv_desc_attendance_table*[origintable_attendance#12897,total#12898L] |
++–+

explain select empno,sum(salary)+sum(utilization) as total from originTable 
group by empno order by empno;
+-+--+
| plan |
+-+--+
| == CarbonData Profiler ==
Table Scan on mv_order_table
 - total blocklets: 6
 - filter: none
 - pruned by Main DataMap
 - skipped blocklets: 0
 |
| == Physical Plan ==
*Sort [empno#12822 ASC NULLS FIRST], true, 0
+- Exchange rangepartitioning(empno#12822 ASC NULLS FIRST, 200)
 +- *Project [origintable_empno#10724 AS empno#12822, total#10725L]
 +- *FileScan carbondata 
b011.mv_order_table[origintable_empno#10724,total#10725L] |
+-+--+

 

 

 

 

> MV Datamap - When the MV is created with the order by, then when we execute 
> the corresponding query defined in MV with order by, then the data is not 
> accessed from the MV. 
> 
>
> Key: CARBONDATA-2528
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2528
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
> Environment: 3 node Opensource ANT cluster. (Opensource Hadoop 2.7.2+ 
> Opensource Spark 2.2.1+ Opensource Carbondata 1.3.1)
>Reporter: Prasanna Ravichandran
>Assignee: Ravindra Pesala
>Priority: Minor
>  Labels: CarbonData, MV, Materialistic_Views
> Fix For: 1.5.0, 1.4.1
>
> Attachments: MV_orderby.docx, data.csv
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> When the MV is created with the order by condition, then when we execute the 
> corresponding query defined in MV along with order by, then the data is not 
> accessed from the MV. The data is being accessed from the maintable only. 
> Test queries:
> create datamap MV_order using 'mv' as select 
> empno,sum(salary)+sum(utilization) as total from originTable group by empno 
> order by empno;
> create datamap MV_desc_order using 'mv' as select 
> empno,sum(salary)+sum(utilization) as total from originTable group by empno 
> order by empno DESC;
> rebuild 

[jira] [Updated] (CARBONDATA-2731) Timeseries datamap queries should fetch data from the Timeseries datamap.

2018-07-11 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2731:
--
Description: 
While creation of the timeseries datamap, queries to which it would apply would 
also be defined. SO when the user uses that same query after creation of TS 
datamap, then that query should fetch the data from the TS datamap created. 

Test queries:

create table brinjal (imei string,AMSize string,channelsId string,ActiveCountry 
string, Activecity string,gamePointId double,deviceInformationId 
double,productionDate Timestamp,deliveryDate timestamp,deliverycharge double) 
STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('table_blocksize'='1');
 LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/vardhandaterestruct.csv' INTO 
TABLE brinjal OPTIONS('DELIMITER'=',', 'QUOTECHAR'= 
'"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 
'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge');

CREATE DATAMAP agg0_time ON TABLE brinjal USING 'timeSeries' DMPROPERTIES 
('EVENT_TIME'='productionDate','SECOND_GRANULARITY'='1') AS SELECT 
productionDate, SUM(imei) FROM brinjal GROUP BY productionDate;

explain SELECT productionDate, SUM(imei) FROM brinjal GROUP BY productionDate; 

0: jdbc:hive2://10.18.98.136:23040/default> show datamap on table brinjal;
 
+--+------
|DataMapName|ClassName|Associated Table|DataMap Properties|

+--+------
|agg0_time|timeSeries|*rp.brinjal_agg0_time*|'event_time'='productionDate', 
'second_granularity'='1'|
|sensor|preaggregate|rp.brinjal_sensor| |

+--+------
 2 rows selected (0.042 seconds)
 0: jdbc:hive2://10.18.98.136:23040/default> explain SELECT productionDate, 
SUM(imei) FROM brinjal GROUP BY productionDate;
 
+--++
|plan|

+--++
|== CarbonData Profiler ==
 Table Scan on brinjal
 - total blocklets: 1
 - filter: none
 - pruned by Main DataMap
 - skipped blocklets: 0|
|== Physical Plan ==
 *HashAggregate(keys=[productionDate#155228|#155228], 
functions=[sum(cast(imei#155221 as double))|#155221 as double))])
 +- Exchange hashpartitioning(productionDate#155228, 200)
 +- *HashAggregate(keys=[productionDate#155228|#155228], 
functions=[partial_sum(cast(imei#155221 as double))|#155221 as double))])
 +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :rp, *Table 
name :brinjal,* Schema :Some(StructType(StructField(imei,StringType,true), 
StructField(amsize,StringType,true), StructField(channelsid,StringType,true), 
StructField(activecountry,StringType,true), 
StructField(activecity,StringType,true), 
StructField(gamepointid,DoubleType,true), 
StructField(deviceinformationid,DoubleType,true), 
StructField(productiondate,TimestampType,true), 
StructField(deliverydate,TimestampType,true), 
StructField(deliverycharge,DoubleType,true))) ] 

[jira] [Updated] (CARBONDATA-2731) Timeseries datamap queries should fetch data from the Timeseries datamap.

2018-07-11 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2731:
--
Description: 
While creation of the timeseries datamap, queries to which it would apply would 
also be defined. SO when the user uses that same query after creation of TS 
datamap, then that query should fetch the data from the TS datamap created. 

Test queries:

create table brinjal (imei string,AMSize string,channelsId string,ActiveCountry 
string, Activecity string,gamePointId double,deviceInformationId 
double,productionDate Timestamp,deliveryDate timestamp,deliverycharge double) 
STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('table_blocksize'='1');
 LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/vardhandaterestruct.csv' INTO 
TABLE brinjal OPTIONS('DELIMITER'=',', 'QUOTECHAR'= 
'"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 
'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge');

CREATE DATAMAP agg0_time ON TABLE brinjal USING 'timeSeries' DMPROPERTIES 
('EVENT_TIME'='productionDate','SECOND_GRANULARITY'='1') AS SELECT 
productionDate, SUM(imei) FROM brinjal GROUP BY productionDate;

explain SELECT productionDate, SUM(imei) FROM brinjal GROUP BY productionDate; 

0: jdbc:hive2://10.18.98.136:23040/default> show datamap on table brinjal;
 
+-+---
|DataMapName|ClassName|Associated Table|DataMap Properties|

+-+---
|agg0_time|timeSeries|*rp.brinjal_agg0_time*|'event_time'='productionDate', 
'second_granularity'='1'|
|sensor|preaggregate|rp.brinjal_sensor| |

+-+---
 2 rows selected (0.042 seconds)
 0: jdbc:hive2://10.18.98.136:23040/default> explain SELECT productionDate, 
SUM(imei) FROM brinjal GROUP BY productionDate;
 
+--++
|plan|

+--++
|== CarbonData Profiler ==
 Table Scan on brinjal
 - total blocklets: 1
 - filter: none
 - pruned by Main DataMap
 - skipped blocklets: 0|
|== Physical Plan ==
 *HashAggregate(keys=[productionDate#155228|#155228], 
functions=[sum(cast(imei#155221 as double))|#155221 as double))])
 +- Exchange hashpartitioning(productionDate#155228, 200)
 +- *HashAggregate(keys=[productionDate#155228|#155228], 
functions=[partial_sum(cast(imei#155221 as double))|#155221 as double))])
 +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :rp, *Table 
name :brinjal,* Schema :Some(StructType(StructField(imei,StringType,true), 
StructField(amsize,StringType,true), StructField(channelsid,StringType,true), 
StructField(activecountry,StringType,true), 
StructField(activecity,StringType,true), 
StructField(gamepointid,DoubleType,true), 
StructField(deviceinformationid,DoubleType,true), 
StructField(productiondate,TimestampType,true), 
StructField(deliverydate,TimestampType,true), 
StructField(deliverycharge,DoubleType,true))) ] 

[jira] [Updated] (CARBONDATA-2731) Timeseries datamap queries should fetch data from the Timeseries datamap.

2018-07-11 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2731:
--
Description: 
While creation of the timeseries datamap, queries to which it would apply would 
also be defined. SO when the user uses that same query after creation of TS 
datamap, then that query should fetch the data from the TS datamap created. 

Test queries:

create table brinjal (imei string,AMSize string,channelsId string,ActiveCountry 
string, Activecity string,gamePointId double,deviceInformationId 
double,productionDate Timestamp,deliveryDate timestamp,deliverycharge double) 
STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('table_blocksize'='1');
 LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/vardhandaterestruct.csv' INTO 
TABLE brinjal OPTIONS('DELIMITER'=',', 'QUOTECHAR'= 
'"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 
'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge');

CREATE DATAMAP agg0_time ON TABLE brinjal USING 'timeSeries' DMPROPERTIES 
('EVENT_TIME'='productionDate','SECOND_GRANULARITY'='1') AS SELECT 
productionDate, SUM(imei) FROM brinjal GROUP BY productionDate;

explain SELECT productionDate, SUM(imei) FROM brinjal GROUP BY productionDate; 

0: jdbc:hive2://10.18.98.136:23040/default> show datamap on table brinjal;
 
++-+++-+--
|DataMapName|ClassName|Associated Table|DataMap Properties|

++-+++-+--
|agg0_time|timeSeries|*rp.brinjal_agg0_time*|'event_time'='productionDate', 
'second_granularity'='1'|
|sensor|preaggregate|rp.brinjal_sensor| |

++-+++-+--
 2 rows selected (0.042 seconds)
 0: jdbc:hive2://10.18.98.136:23040/default> explain SELECT productionDate, 
SUM(imei) FROM brinjal GROUP BY productionDate;
 
+--++
|plan|

+--++
|== CarbonData Profiler ==
 Table Scan on brinjal
 - total blocklets: 1
 - filter: none
 - pruned by Main DataMap
 - skipped blocklets: 0|
|== Physical Plan ==
 *HashAggregate(keys=[productionDate#155228|#155228], 
functions=[sum(cast(imei#155221 as double))|#155221 as double))])
 +- Exchange hashpartitioning(productionDate#155228, 200)
 +- *HashAggregate(keys=[productionDate#155228|#155228], 
functions=[partial_sum(cast(imei#155221 as double))|#155221 as double))])
 +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :rp, *Table 
name :brinjal,* Schema :Some(StructType(StructField(imei,StringType,true), 
StructField(amsize,StringType,true), StructField(channelsid,StringType,true), 
StructField(activecountry,StringType,true), 
StructField(activecity,StringType,true), 
StructField(gamepointid,DoubleType,true), 
StructField(deviceinformationid,DoubleType,true), 
StructField(productiondate,TimestampType,true), 
StructField(deliverydate,TimestampType,true), 
StructField(deliverycharge,DoubleType,true))) ] 

[jira] [Updated] (CARBONDATA-2731) Timeseries datamap queries should fetch data from the Timeseries datamap.

2018-07-11 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2731:
--
Description: 
While creation of the timeseries datamap, queries to which it would apply would 
also be defined. SO when the user uses that same query after creation of TS 
datamap, then that query should fetch the data from the TS datamap created. 

Test queries:

create table brinjal (imei string,AMSize string,channelsId string,ActiveCountry 
string, Activecity string,gamePointId double,deviceInformationId 
double,productionDate Timestamp,deliveryDate timestamp,deliverycharge double) 
STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('table_blocksize'='1');
 LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/vardhandaterestruct.csv' INTO 
TABLE brinjal OPTIONS('DELIMITER'=',', 'QUOTECHAR'= 
'"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 
'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge');

CREATE DATAMAP agg0_time ON TABLE brinjal USING 'timeSeries' DMPROPERTIES 
('EVENT_TIME'='productionDate','SECOND_GRANULARITY'='1') AS SELECT 
productionDate, SUM(imei) FROM brinjal GROUP BY productionDate;

explain SELECT productionDate, SUM(imei) FROM brinjal GROUP BY productionDate; 

0: jdbc:hive2://10.18.98.136:23040/default> show datamap on table brinjal;
 
+---+--++-++--
|DataMapName|ClassName|Associated Table|DataMap Properties|

+---+--++-++--
|agg0_time|timeSeries|*rp.brinjal_agg0_time*|'event_time'='productionDate', 
'second_granularity'='1'|
|sensor|preaggregate|rp.brinjal_sensor| |

+---+--++-++--
 2 rows selected (0.042 seconds)
 0: jdbc:hive2://10.18.98.136:23040/default> explain SELECT productionDate, 
SUM(imei) FROM brinjal GROUP BY productionDate;
 
+-+-+
|plan|

+-+-+
|== CarbonData Profiler ==
 Table Scan on brinjal
 - total blocklets: 1
 - filter: none
 - pruned by Main DataMap
 - skipped blocklets: 0|
|== Physical Plan ==
 *HashAggregate(keys=[productionDate#155228|#155228], 
functions=[sum(cast(imei#155221 as double))|#155221 as double))])
 +- Exchange hashpartitioning(productionDate#155228, 200)
 +- *HashAggregate(keys=[productionDate#155228|#155228], 
functions=[partial_sum(cast(imei#155221 as double))|#155221 as double))])
 +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :rp, *Table 
name :brinjal,* Schema :Some(StructType(StructField(imei,StringType,true), 
StructField(amsize,StringType,true), StructField(channelsid,StringType,true), 
StructField(activecountry,StringType,true), 
StructField(activecity,StringType,true), 
StructField(gamepointid,DoubleType,true), 
StructField(deviceinformationid,DoubleType,true), 
StructField(productiondate,TimestampType,true), 
StructField(deliverydate,TimestampType,true), 
StructField(deliverycharge,DoubleType,true))) ] 

[jira] [Created] (CARBONDATA-2731) Timeseries datamap queries should fetch data from the Timeseries datamap.

2018-07-11 Thread Prasanna Ravichandran (JIRA)
Prasanna Ravichandran created CARBONDATA-2731:
-

 Summary: Timeseries datamap queries should  fetch data from the 
Timeseries datamap.
 Key: CARBONDATA-2731
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2731
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Affects Versions: 1.4.1
 Environment: Spark 2.2
Reporter: Prasanna Ravichandran


While creation of the timeseries datamap, queries to which it would apply would 
also be defined. SO when the user uses that same query after creation of TS 
datamap, then that query should fetch the data from the TS datamap created. 

Test queries:

create table brinjal (imei string,AMSize string,channelsId string,ActiveCountry 
string, Activecity string,gamePointId double,deviceInformationId 
double,productionDate Timestamp,deliveryDate timestamp,deliverycharge double) 
STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('table_blocksize'='1');
LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/vardhandaterestruct.csv' INTO 
TABLE brinjal OPTIONS('DELIMITER'=',', 'QUOTECHAR'= 
'"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 
'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge');

CREATE DATAMAP agg0_time ON TABLE brinjal USING 'timeSeries' DMPROPERTIES 
('EVENT_TIME'='productionDate','SECOND_GRANULARITY'='1') AS SELECT 
productionDate, SUM(imei) FROM brinjal GROUP BY productionDate;


explain SELECT productionDate, SUM(imei) FROM brinjal GROUP BY productionDate; 
!image-2018-07-11-18-06-15-260.png!

0: jdbc:hive2://10.18.98.136:23040/default> show datamap on table brinjal;
+--+---+---+--+--+
| DataMapName | ClassName | Associated Table | DataMap Properties |
+--+---+---+--+--+
| agg0_time | timeSeries | *rp.brinjal_agg0_time* | 
'event_time'='productionDate', 'second_granularity'='1' |
| sensor | preaggregate | rp.brinjal_sensor | |
+--+---+---+--+--+
2 rows selected (0.042 seconds)
0: jdbc:hive2://10.18.98.136:23040/default> explain SELECT productionDate, 
SUM(imei) FROM brinjal GROUP BY productionDate;
++--+
| plan |
++--+
| == CarbonData Profiler ==
Table Scan on brinjal
 - total blocklets: 1
 - filter: none
 - pruned by Main DataMap
 - skipped blocklets: 0
 |
| == Physical Plan ==
*HashAggregate(keys=[productionDate#155228], functions=[sum(cast(imei#155221 as 
double))])
+- Exchange hashpartitioning(productionDate#155228, 200)
 +- *HashAggregate(keys=[productionDate#155228], 
functions=[partial_sum(cast(imei#155221 as double))])
 +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :rp, *Table 
name :brinjal,* Schema :Some(StructType(StructField(imei,StringType,true), 
StructField(amsize,StringType,true), StructField(channelsid,StringType,true), 
StructField(activecountry,StringType,true), 

[jira] [Closed] (CARBONDATA-2522) MV dataset when created with Joins, then it is not pointing towards the MV, while executing that join query.

2018-06-27 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran closed CARBONDATA-2522.
-
Resolution: Invalid

> MV dataset when created with Joins, then it is not pointing towards the MV, 
> while executing that join query.
> 
>
> Key: CARBONDATA-2522
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2522
> Project: CarbonData
>  Issue Type: Bug
> Environment: 3 Node Opensource ANT Cluster.
>Reporter: Prasanna Ravichandran
>Priority: Minor
>  Labels: MV, Materialistic_Views
> Attachments: MV_joins.docx, data_mv.csv, 
> image-2018-06-27-12-10-38-516.png
>
>
> When MV is created on Joining tables, then the explain of that join query 
> points to the maintable, instead of the created MV datamap.  
> Queries:
> drop table if exists fact_table1;
> CREATE TABLE fact_table1 (empno int, empname String, designation String, doj 
> Timestamp,
> workgroupcategory int, workgroupcategoryname String, deptno int, deptname 
> String,
> projectcode int, projectjoindate Timestamp, projectenddate 
> Timestamp,attendance int,
> utilization int,salary int)
> STORED BY 'org.apache.carbondata.format';
> LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO 
> TABLE fact_table1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '"','timestampformat'='dd-MM-');
> LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO 
> TABLE fact_table1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '"','timestampformat'='dd-MM-');
> drop table if exists fact_table2;
> CREATE TABLE fact_table2 (empno int, empname String, designation String, doj 
> Timestamp,
> workgroupcategory int, workgroupcategoryname String, deptno int, deptname 
> String,
> projectcode int, projectjoindate Timestamp, projectenddate 
> Timestamp,attendance int,
> utilization int,salary int)
> STORED BY 'org.apache.carbondata.format';
> LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO 
> TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '"','timestampformat'='dd-MM-');
> LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO 
> TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '"','timestampformat'='dd-MM-');
> drop table if exists fact_table3;
> CREATE TABLE fact_table3 (empno int, empname String, designation String, doj 
> Timestamp,
> workgroupcategory int, workgroupcategoryname String, deptno int, deptname 
> String,
> projectcode int, projectjoindate Timestamp, projectenddate 
> Timestamp,attendance int,
> utilization int,salary int)
> STORED BY 'org.apache.carbondata.format';
> LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO 
> TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '"','timestampformat'='dd-MM-');
> LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO 
> TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '"','timestampformat'='dd-MM-');
> create datamap datamap25 using 'mv' as select t1.empname as c1, 
> t2.designation from fact_table1 t1,fact_table2 t2,fact_table3 t3  where 
> t1.empname = t2.empname and t1.empname=t3.empname;
> explain create datamap datamap25 using 'mv' as select t1.empname as c1, 
> t2.designation from fact_table1 t1,fact_table2 t2,fact_table3 t3  where 
> t1.empname = t2.empname and t1.empname=t3.empname;
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CARBONDATA-2522) MV dataset when created with Joins, then it is not pointing towards the MV, while executing that join query.

2018-06-27 Thread Prasanna Ravichandran (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524656#comment-16524656
 ] 

Prasanna Ravichandran commented on CARBONDATA-2522:
---

!image-2018-06-27-12-10-38-516.png!

Working fine after rebuilding the datamap.

> MV dataset when created with Joins, then it is not pointing towards the MV, 
> while executing that join query.
> 
>
> Key: CARBONDATA-2522
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2522
> Project: CarbonData
>  Issue Type: Bug
> Environment: 3 Node Opensource ANT Cluster.
>Reporter: Prasanna Ravichandran
>Priority: Minor
>  Labels: MV, Materialistic_Views
> Attachments: MV_joins.docx, data_mv.csv, 
> image-2018-06-27-12-10-38-516.png
>
>
> When MV is created on Joining tables, then the explain of that join query 
> points to the maintable, instead of the created MV datamap.  
> Queries:
> drop table if exists fact_table1;
> CREATE TABLE fact_table1 (empno int, empname String, designation String, doj 
> Timestamp,
> workgroupcategory int, workgroupcategoryname String, deptno int, deptname 
> String,
> projectcode int, projectjoindate Timestamp, projectenddate 
> Timestamp,attendance int,
> utilization int,salary int)
> STORED BY 'org.apache.carbondata.format';
> LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO 
> TABLE fact_table1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '"','timestampformat'='dd-MM-');
> LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO 
> TABLE fact_table1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '"','timestampformat'='dd-MM-');
> drop table if exists fact_table2;
> CREATE TABLE fact_table2 (empno int, empname String, designation String, doj 
> Timestamp,
> workgroupcategory int, workgroupcategoryname String, deptno int, deptname 
> String,
> projectcode int, projectjoindate Timestamp, projectenddate 
> Timestamp,attendance int,
> utilization int,salary int)
> STORED BY 'org.apache.carbondata.format';
> LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO 
> TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '"','timestampformat'='dd-MM-');
> LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO 
> TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '"','timestampformat'='dd-MM-');
> drop table if exists fact_table3;
> CREATE TABLE fact_table3 (empno int, empname String, designation String, doj 
> Timestamp,
> workgroupcategory int, workgroupcategoryname String, deptno int, deptname 
> String,
> projectcode int, projectjoindate Timestamp, projectenddate 
> Timestamp,attendance int,
> utilization int,salary int)
> STORED BY 'org.apache.carbondata.format';
> LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO 
> TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '"','timestampformat'='dd-MM-');
> LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO 
> TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '"','timestampformat'='dd-MM-');
> create datamap datamap25 using 'mv' as select t1.empname as c1, 
> t2.designation from fact_table1 t1,fact_table2 t2,fact_table3 t3  where 
> t1.empname = t2.empname and t1.empname=t3.empname;
> explain create datamap datamap25 using 'mv' as select t1.empname as c1, 
> t2.designation from fact_table1 t1,fact_table2 t2,fact_table3 t3  where 
> t1.empname = t2.empname and t1.empname=t3.empname;
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2522) MV dataset when created with Joins, then it is not pointing towards the MV, while executing that join query.

2018-06-27 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2522:
--
Attachment: image-2018-06-27-12-10-38-516.png

> MV dataset when created with Joins, then it is not pointing towards the MV, 
> while executing that join query.
> 
>
> Key: CARBONDATA-2522
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2522
> Project: CarbonData
>  Issue Type: Bug
> Environment: 3 Node Opensource ANT Cluster.
>Reporter: Prasanna Ravichandran
>Priority: Minor
>  Labels: MV, Materialistic_Views
> Attachments: MV_joins.docx, data_mv.csv, 
> image-2018-06-27-12-10-38-516.png
>
>
> When MV is created on Joining tables, then the explain of that join query 
> points to the maintable, instead of the created MV datamap.  
> Queries:
> drop table if exists fact_table1;
> CREATE TABLE fact_table1 (empno int, empname String, designation String, doj 
> Timestamp,
> workgroupcategory int, workgroupcategoryname String, deptno int, deptname 
> String,
> projectcode int, projectjoindate Timestamp, projectenddate 
> Timestamp,attendance int,
> utilization int,salary int)
> STORED BY 'org.apache.carbondata.format';
> LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO 
> TABLE fact_table1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '"','timestampformat'='dd-MM-');
> LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO 
> TABLE fact_table1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '"','timestampformat'='dd-MM-');
> drop table if exists fact_table2;
> CREATE TABLE fact_table2 (empno int, empname String, designation String, doj 
> Timestamp,
> workgroupcategory int, workgroupcategoryname String, deptno int, deptname 
> String,
> projectcode int, projectjoindate Timestamp, projectenddate 
> Timestamp,attendance int,
> utilization int,salary int)
> STORED BY 'org.apache.carbondata.format';
> LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO 
> TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '"','timestampformat'='dd-MM-');
> LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO 
> TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '"','timestampformat'='dd-MM-');
> drop table if exists fact_table3;
> CREATE TABLE fact_table3 (empno int, empname String, designation String, doj 
> Timestamp,
> workgroupcategory int, workgroupcategoryname String, deptno int, deptname 
> String,
> projectcode int, projectjoindate Timestamp, projectenddate 
> Timestamp,attendance int,
> utilization int,salary int)
> STORED BY 'org.apache.carbondata.format';
> LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO 
> TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '"','timestampformat'='dd-MM-');
> LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO 
> TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '"','timestampformat'='dd-MM-');
> create datamap datamap25 using 'mv' as select t1.empname as c1, 
> t2.designation from fact_table1 t1,fact_table2 t2,fact_table3 t3  where 
> t1.empname = t2.empname and t1.empname=t3.empname;
> explain create datamap datamap25 using 'mv' as select t1.empname as c1, 
> t2.designation from fact_table1 t1,fact_table2 t2,fact_table3 t3  where 
> t1.empname = t2.empname and t1.empname=t3.empname;
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (CARBONDATA-2537) MV Dataset - User queries with 'having' condition is not accessing the data from the MV datamap.

2018-06-27 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran closed CARBONDATA-2537.
-
Resolution: Invalid

User have to rebuild the datamap once, after that creation. So that it works 
fine.

> MV Dataset - User queries with 'having' condition is not accessing the data 
> from the MV datamap.
> 
>
> Key: CARBONDATA-2537
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2537
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
> Environment: 3 Node Opensource ANT cluster.
>Reporter: Prasanna Ravichandran
>Assignee: xubo245
>Priority: Minor
>  Labels: Carbondata, MV, Materialistic_Views
> Attachments: data.csv, image-2018-05-25-15-50-23-903.png, 
> image-2018-06-27-11-53-49-587.png, image-2018-06-27-11-54-31-158.png
>
>
> User queries with 'having' condition is not accessing the data from the MV 
> datamap. It is accessing the data from the Main table.
> Test queries - spark shell:
> scala>carbon.sql("CREATE TABLE originTable (empno int, empname String, 
> designation String, doj Timestamp, workgroupcategory int, 
> workgroupcategoryname String, deptno int, deptname String, projectcode int, 
> projectjoindate Timestamp, projectenddate Timestamp,attendance int, 
> utilization int,salary int) STORED BY 'org.apache.carbondata.format'").show()
> ++
> ||
> ++
> ++
> scala>carbon.sql("LOAD DATA local inpath 
> 'hdfs://hacluster/user/prasanna/data.csv' INTO TABLE originTable 
> OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '\"','timestampformat'='dd-MM-')").show()
> ++
> ||
> ++
> ++
> scala> carbon.sql("select empno from originTable having 
> salary>1").show(200,false)
> +-+
> |empno|
> +-+
> |14 |
> |15 |
> |20 |
> |19 |
> +-+
> scala> carbon.sql("create datamap mv_hav using 'mv' as select empno from 
> originTable having salary>1").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("explain select empno from originTable having 
> salary>1").show(200,false)
> +---+
> |plan |
> +---+
> |== CarbonData Profiler ==
> Table Scan on origintable
>  - total blocklets: 1
>  - filter: (salary <> null and salary > 1)
>  - pruned by Main DataMap
>  - skipped blocklets: 0
>  |
> |== Physical Plan ==
> *Project [empno#1131]
> +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :default, 
> Table name :origintable, Schema 
> :Some(StructType(StructField(empno,IntegerType,true), 
> StructField(empname,StringType,true), 
> StructField(designation,StringType,true), 
> StructField(doj,TimestampType,true), 
> StructField(workgroupcategory,IntegerType,true), 
> StructField(workgroupcategoryname,StringType,true), 
> StructField(deptno,IntegerType,true), StructField(deptname,StringType,true), 
> StructField(projectcode,IntegerType,true), 
> StructField(projectjoindate,TimestampType,true), 
> StructField(projectenddate,TimestampType,true), 
> StructField(attendance,IntegerType,true), 
> StructField(utilization,IntegerType,true), 
> 

[jira] [Comment Edited] (CARBONDATA-2537) MV Dataset - User queries with 'having' condition is not accessing the data from the MV datamap.

2018-06-27 Thread Prasanna Ravichandran (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524638#comment-16524638
 ] 

Prasanna Ravichandran edited comment on CARBONDATA-2537 at 6/27/18 6:25 AM:


User "HAVING" queries are accessing the data from the created MV datamap only. 
User have to rebuild the datamap once, after creation. Closed.

!image-2018-06-27-11-54-31-158.png!


was (Author: prasanna ravichandran):
User queries are accessing the data from the created MV datamap. User have to 
rebuild the datamap once, after creation. Closed.

!image-2018-06-27-11-54-31-158.png!

> MV Dataset - User queries with 'having' condition is not accessing the data 
> from the MV datamap.
> 
>
> Key: CARBONDATA-2537
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2537
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
> Environment: 3 Node Opensource ANT cluster.
>Reporter: Prasanna Ravichandran
>Assignee: xubo245
>Priority: Minor
>  Labels: Carbondata, MV, Materialistic_Views
> Attachments: data.csv, image-2018-05-25-15-50-23-903.png, 
> image-2018-06-27-11-53-49-587.png, image-2018-06-27-11-54-31-158.png
>
>
> User queries with 'having' condition is not accessing the data from the MV 
> datamap. It is accessing the data from the Main table.
> Test queries - spark shell:
> scala>carbon.sql("CREATE TABLE originTable (empno int, empname String, 
> designation String, doj Timestamp, workgroupcategory int, 
> workgroupcategoryname String, deptno int, deptname String, projectcode int, 
> projectjoindate Timestamp, projectenddate Timestamp,attendance int, 
> utilization int,salary int) STORED BY 'org.apache.carbondata.format'").show()
> ++
> ||
> ++
> ++
> scala>carbon.sql("LOAD DATA local inpath 
> 'hdfs://hacluster/user/prasanna/data.csv' INTO TABLE originTable 
> OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '\"','timestampformat'='dd-MM-')").show()
> ++
> ||
> ++
> ++
> scala> carbon.sql("select empno from originTable having 
> salary>1").show(200,false)
> +-+
> |empno|
> +-+
> |14 |
> |15 |
> |20 |
> |19 |
> +-+
> scala> carbon.sql("create datamap mv_hav using 'mv' as select empno from 
> originTable having salary>1").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("explain select empno from originTable having 
> salary>1").show(200,false)
> +---+
> |plan |
> +---+
> |== CarbonData Profiler ==
> Table Scan on origintable
>  - total blocklets: 1
>  - filter: (salary <> null and salary > 1)
>  - pruned by Main DataMap
>  - skipped blocklets: 0
>  |
> |== Physical Plan ==
> *Project [empno#1131]
> +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :default, 
> Table name :origintable, Schema 
> :Some(StructType(StructField(empno,IntegerType,true), 
> StructField(empname,StringType,true), 
> StructField(designation,StringType,true), 
> StructField(doj,TimestampType,true), 
> StructField(workgroupcategory,IntegerType,true), 
> 

[jira] [Commented] (CARBONDATA-2537) MV Dataset - User queries with 'having' condition is not accessing the data from the MV datamap.

2018-06-27 Thread Prasanna Ravichandran (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524638#comment-16524638
 ] 

Prasanna Ravichandran commented on CARBONDATA-2537:
---

User queries are accessing the data from the created MV datamap. User have to 
rebuild the datamap once, after creation. Closed.

!image-2018-06-27-11-54-31-158.png!

> MV Dataset - User queries with 'having' condition is not accessing the data 
> from the MV datamap.
> 
>
> Key: CARBONDATA-2537
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2537
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
> Environment: 3 Node Opensource ANT cluster.
>Reporter: Prasanna Ravichandran
>Assignee: xubo245
>Priority: Minor
>  Labels: Carbondata, MV, Materialistic_Views
> Attachments: data.csv, image-2018-05-25-15-50-23-903.png, 
> image-2018-06-27-11-53-49-587.png, image-2018-06-27-11-54-31-158.png
>
>
> User queries with 'having' condition is not accessing the data from the MV 
> datamap. It is accessing the data from the Main table.
> Test queries - spark shell:
> scala>carbon.sql("CREATE TABLE originTable (empno int, empname String, 
> designation String, doj Timestamp, workgroupcategory int, 
> workgroupcategoryname String, deptno int, deptname String, projectcode int, 
> projectjoindate Timestamp, projectenddate Timestamp,attendance int, 
> utilization int,salary int) STORED BY 'org.apache.carbondata.format'").show()
> ++
> ||
> ++
> ++
> scala>carbon.sql("LOAD DATA local inpath 
> 'hdfs://hacluster/user/prasanna/data.csv' INTO TABLE originTable 
> OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '\"','timestampformat'='dd-MM-')").show()
> ++
> ||
> ++
> ++
> scala> carbon.sql("select empno from originTable having 
> salary>1").show(200,false)
> +-+
> |empno|
> +-+
> |14 |
> |15 |
> |20 |
> |19 |
> +-+
> scala> carbon.sql("create datamap mv_hav using 'mv' as select empno from 
> originTable having salary>1").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("explain select empno from originTable having 
> salary>1").show(200,false)
> +---+
> |plan |
> +---+
> |== CarbonData Profiler ==
> Table Scan on origintable
>  - total blocklets: 1
>  - filter: (salary <> null and salary > 1)
>  - pruned by Main DataMap
>  - skipped blocklets: 0
>  |
> |== Physical Plan ==
> *Project [empno#1131]
> +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :default, 
> Table name :origintable, Schema 
> :Some(StructType(StructField(empno,IntegerType,true), 
> StructField(empname,StringType,true), 
> StructField(designation,StringType,true), 
> StructField(doj,TimestampType,true), 
> StructField(workgroupcategory,IntegerType,true), 
> StructField(workgroupcategoryname,StringType,true), 
> StructField(deptno,IntegerType,true), StructField(deptname,StringType,true), 
> StructField(projectcode,IntegerType,true), 
> StructField(projectjoindate,TimestampType,true), 
> StructField(projectenddate,TimestampType,true), 
> 

[jira] [Updated] (CARBONDATA-2537) MV Dataset - User queries with 'having' condition is not accessing the data from the MV datamap.

2018-06-27 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2537:
--
Attachment: image-2018-06-27-11-54-31-158.png

> MV Dataset - User queries with 'having' condition is not accessing the data 
> from the MV datamap.
> 
>
> Key: CARBONDATA-2537
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2537
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
> Environment: 3 Node Opensource ANT cluster.
>Reporter: Prasanna Ravichandran
>Assignee: xubo245
>Priority: Minor
>  Labels: Carbondata, MV, Materialistic_Views
> Attachments: data.csv, image-2018-05-25-15-50-23-903.png, 
> image-2018-06-27-11-53-49-587.png, image-2018-06-27-11-54-31-158.png
>
>
> User queries with 'having' condition is not accessing the data from the MV 
> datamap. It is accessing the data from the Main table.
> Test queries - spark shell:
> scala>carbon.sql("CREATE TABLE originTable (empno int, empname String, 
> designation String, doj Timestamp, workgroupcategory int, 
> workgroupcategoryname String, deptno int, deptname String, projectcode int, 
> projectjoindate Timestamp, projectenddate Timestamp,attendance int, 
> utilization int,salary int) STORED BY 'org.apache.carbondata.format'").show()
> ++
> ||
> ++
> ++
> scala>carbon.sql("LOAD DATA local inpath 
> 'hdfs://hacluster/user/prasanna/data.csv' INTO TABLE originTable 
> OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '\"','timestampformat'='dd-MM-')").show()
> ++
> ||
> ++
> ++
> scala> carbon.sql("select empno from originTable having 
> salary>1").show(200,false)
> +-+
> |empno|
> +-+
> |14 |
> |15 |
> |20 |
> |19 |
> +-+
> scala> carbon.sql("create datamap mv_hav using 'mv' as select empno from 
> originTable having salary>1").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("explain select empno from originTable having 
> salary>1").show(200,false)
> +---+
> |plan |
> +---+
> |== CarbonData Profiler ==
> Table Scan on origintable
>  - total blocklets: 1
>  - filter: (salary <> null and salary > 1)
>  - pruned by Main DataMap
>  - skipped blocklets: 0
>  |
> |== Physical Plan ==
> *Project [empno#1131]
> +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :default, 
> Table name :origintable, Schema 
> :Some(StructType(StructField(empno,IntegerType,true), 
> StructField(empname,StringType,true), 
> StructField(designation,StringType,true), 
> StructField(doj,TimestampType,true), 
> StructField(workgroupcategory,IntegerType,true), 
> StructField(workgroupcategoryname,StringType,true), 
> StructField(deptno,IntegerType,true), StructField(deptname,StringType,true), 
> StructField(projectcode,IntegerType,true), 
> StructField(projectjoindate,TimestampType,true), 
> StructField(projectenddate,TimestampType,true), 
> StructField(attendance,IntegerType,true), 
> StructField(utilization,IntegerType,true), 
> StructField(salary,IntegerType,true))) ] 

[jira] [Updated] (CARBONDATA-2537) MV Dataset - User queries with 'having' condition is not accessing the data from the MV datamap.

2018-06-27 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2537:
--
Attachment: image-2018-06-27-11-53-49-587.png

> MV Dataset - User queries with 'having' condition is not accessing the data 
> from the MV datamap.
> 
>
> Key: CARBONDATA-2537
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2537
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
> Environment: 3 Node Opensource ANT cluster.
>Reporter: Prasanna Ravichandran
>Assignee: xubo245
>Priority: Minor
>  Labels: Carbondata, MV, Materialistic_Views
> Attachments: data.csv, image-2018-05-25-15-50-23-903.png, 
> image-2018-06-27-11-53-49-587.png
>
>
> User queries with 'having' condition is not accessing the data from the MV 
> datamap. It is accessing the data from the Main table.
> Test queries - spark shell:
> scala>carbon.sql("CREATE TABLE originTable (empno int, empname String, 
> designation String, doj Timestamp, workgroupcategory int, 
> workgroupcategoryname String, deptno int, deptname String, projectcode int, 
> projectjoindate Timestamp, projectenddate Timestamp,attendance int, 
> utilization int,salary int) STORED BY 'org.apache.carbondata.format'").show()
> ++
> ||
> ++
> ++
> scala>carbon.sql("LOAD DATA local inpath 
> 'hdfs://hacluster/user/prasanna/data.csv' INTO TABLE originTable 
> OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '\"','timestampformat'='dd-MM-')").show()
> ++
> ||
> ++
> ++
> scala> carbon.sql("select empno from originTable having 
> salary>1").show(200,false)
> +-+
> |empno|
> +-+
> |14 |
> |15 |
> |20 |
> |19 |
> +-+
> scala> carbon.sql("create datamap mv_hav using 'mv' as select empno from 
> originTable having salary>1").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("explain select empno from originTable having 
> salary>1").show(200,false)
> +---+
> |plan |
> +---+
> |== CarbonData Profiler ==
> Table Scan on origintable
>  - total blocklets: 1
>  - filter: (salary <> null and salary > 1)
>  - pruned by Main DataMap
>  - skipped blocklets: 0
>  |
> |== Physical Plan ==
> *Project [empno#1131]
> +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :default, 
> Table name :origintable, Schema 
> :Some(StructType(StructField(empno,IntegerType,true), 
> StructField(empname,StringType,true), 
> StructField(designation,StringType,true), 
> StructField(doj,TimestampType,true), 
> StructField(workgroupcategory,IntegerType,true), 
> StructField(workgroupcategoryname,StringType,true), 
> StructField(deptno,IntegerType,true), StructField(deptname,StringType,true), 
> StructField(projectcode,IntegerType,true), 
> StructField(projectjoindate,TimestampType,true), 
> StructField(projectenddate,TimestampType,true), 
> StructField(attendance,IntegerType,true), 
> StructField(utilization,IntegerType,true), 
> StructField(salary,IntegerType,true))) ] default.origintable[empno#1131] 
> PushedFilters: 

[jira] [Created] (CARBONDATA-2580) MV Datamap - Cannot create two MV`s with same name in different databases.

2018-06-05 Thread Prasanna Ravichandran (JIRA)
Prasanna Ravichandran created CARBONDATA-2580:
-

 Summary: MV Datamap - Cannot create two MV`s with same name in 
different databases.
 Key: CARBONDATA-2580
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2580
 Project: CarbonData
  Issue Type: Bug
  Components: data-load, data-query
 Environment: 3 Node Opensource ANT cluster
Reporter: Prasanna Ravichandran


Cannot create two MV`s with same name in different databases. If you create a 
MV datamap say MV1 in default database, then you could not use the same 
name(MV1) for defining another MV datamap in any other database.

Test queries: 

scala> carbon.sql("create table ratish(id int, name string) stored by 
'carbondata'").show(200,false)
++
||
++
++


scala> carbon.sql("insert into ratish select 1,'ram'").show(200,false)
++
||
++
++


scala> carbon.sql("insert into ratish select 2,'ravi'").show(200,false)
++
||
++
++


scala> carbon.sql("insert into ratish select 3,'raghu'").show(200,false)
++
||
++
++


scala> carbon.sql("create datamap radi using 'mv' as select name from 
ratish").show(200,false)
++
||
++
++


scala> carbon.sql("rebuild datamap radi").show(200,false)
++
||
++
++


scala> carbon.sql("explain select name from ratish").show(200,false)
+--+
|plan |
+--+
|== CarbonData Profiler ==
Table Scan on radi_table
 - total blocklets: 1
 - filter: none
 - pruned by Main DataMap
 - skipped blocklets: 0
 |
|== Physical Plan ==
*Project [ratish_name#13790 AS name#13818]
+- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :default, Table 
name :radi_table, Schema 
:Some(StructType(StructField(ratish_name,StringType,true))) ] 
default.radi_table[ratish_name#13790]|
+--+


scala> carbon.sql("create database rad").show(200,false)
++
||
++
++


scala> carbon.sql("use rad").show(200,false)
++
||
++
++


scala> carbon.sql("create table ratish(id int, name string) stored by 
'carbondata'").show(200,false)
++
||
++
++


scala> carbon.sql("insert into ratish select 1,'ram'").show(200,false)
++
||
++
++


scala> carbon.sql("insert into ratish select 2,'ravi'").show(200,false)
++
||
++
++


scala> carbon.sql("insert into ratish select 3,'raghu'").show(200,false)
++
||
++
++


scala> carbon.sql("create datamap radi using 'mv' as select name from 
ratish").show(200,false)
java.io.IOException: DataMap with name radi already exists in storage
 at 
org.apache.carbondata.core.metadata.schema.table.DiskBasedDMSchemaStorageProvider.saveSchema(DiskBasedDMSchemaStorageProvider.java:70)
 at 
org.apache.carbondata.core.datamap.DataMapStoreManager.saveDataMapSchema(DataMapStoreManager.java:158)
 at 
org.apache.carbondata.mv.datamap.MVHelper$.createMVDataMap(MVHelper.scala:115)
 at 
org.apache.carbondata.mv.datamap.MVDataMapProvider.initMeta(MVDataMapProvider.scala:53)
 at 
org.apache.spark.sql.execution.command.datamap.CarbonCreateDataMapCommand.processMetadata(CarbonCreateDataMapCommand.scala:118)
 at 
org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:90)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
 at org.apache.spark.sql.Dataset.(Dataset.scala:183)
 at 
org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:108)
 at 
org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:97)
 at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:155)
 at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:95)
 ... 48 elided

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2576) MV Datamap - MV is not working fine if there is more than 3 aggregate function in the same datamap.

2018-06-04 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2576:
--
Description: 
MV is not working fine if there is more than 3 aggregate function in the same 
datamap. It is working fine upto 3 aggregate functions on the same MV. Please 
see the attached document for more details.

Test queries:

 

scala> carbon.sql("create datamap datamap_comp_maxsumminavg using 'mv' as 
select 
empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) from 
originTable group by empno").show(200,false)

++
  
 ++

++

 

 

scala> carbon.sql("rebuild datamap datamap_comp_maxsumminavg").show(200,false)

++
  
 ++

++

 

 

scala> carbon.sql("explain select 
empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) from 
originTable group by empno").show(200,false)

org.apache.spark.sql.AnalysisException: expression 
'datamap_comp_maxsumminavg_table.`avg_attendance`' is neither present in the 
group by, nor is it an aggregate function. Add to group by or wrap in first() 
(or first_value) if you don't care which value you get.;;

Aggregate [origintable_empno#2925|#2925], [origintable_empno#2925 AS 
empno#3002, max(max_projectenddate#2926) AS max(projectenddate)#3003, 
sum(sum_salary#2927L) AS sum(salary)#3004L, min(min_projectjoindate#2928) AS 
min(projectjoindate)#3005, avg_attendance#2929 AS avg(attendance)#3006|#2925 AS 
empno#3002, max(max_projectenddate#2926) AS max(projectenddate)#3003, 
sum(sum_salary#2927L) AS sum(salary)#3004L, min(min_projectjoindate#2928) AS 
min(projectjoindate)#3005, avg_attendance#2929 AS avg(attendance)#3006]

+- SubqueryAlias datamap_comp_maxsumminavg_table

   +- 
Relation[origintable_empno#2925,max_projectenddate#2926,sum_salary#2927L,min_projectjoindate#2928,avg_attendance#2929|#2925,max_projectenddate#2926,sum_salary#2927L,min_projectjoindate#2928,avg_attendance#2929]
 CarbonDatasourceHadoopRelation [ Database name :default, Table name 
:datamap_comp_maxsumminavg_table, Schema 
:Some(StructType(StructField(origintable_empno,IntegerType,true), 
StructField(max_projectenddate,TimestampType,true), 
StructField(sum_salary,LongType,true), 
StructField(min_projectjoindate,TimestampType,true), 
StructField(avg_attendance,DoubleType,true))) ]

 

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:39)

  at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:247)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253)

  at scala.collection.immutable.List.foreach(List.scala:381)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:253)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280)

  at scala.collection.immutable.List.foreach(List.scala:381)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:280)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:78)

  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:78)

  at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91)

  at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:52)

  at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:148)

  at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:95)

  at 
org.apache.carbondata.mv.datamap.MVAnalyzerRule.apply(MVAnalyzerRule.scala:72)

  at 
org.apache.carbondata.mv.datamap.MVAnalyzerRule.apply(MVAnalyzerRule.scala:38)

  at org.apache.spark.sql.hive.CarbonAnalyzer.execute(CarbonAnalyzer.scala:46)

  at org.apache.spark.sql.hive.CarbonAnalyzer.execute(CarbonAnalyzer.scala:27)

  at 

[jira] [Updated] (CARBONDATA-2576) MV Datamap - MV is not working fine if there is more than 3 aggregate function in the same datamap.

2018-06-04 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2576:
--
Attachment: data.csv

> MV Datamap - MV is not working fine if there is more than 3 aggregate 
> function in the same datamap.
> ---
>
> Key: CARBONDATA-2576
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2576
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Reporter: Prasanna Ravichandran
>Priority: Minor
>  Labels: CARBONDATA., MV, Materialistic_Views
> Attachments: From 4th aggregate function -error shown.docx, data.csv
>
>
> MV is not working fine if there is more than 3 aggregate function in the same 
> datamap.
> Test queries:
>  
> scala> carbon.sql("create datamap datamap_comp_maxsumminavg using 'mv' as 
> select 
> empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) 
> from originTable group by empno").show(200,false)
> ++
> ||
> ++
> ++
>  
>  
> scala> carbon.sql("rebuild datamap datamap_comp_maxsumminavg").show(200,false)
> ++
> ||
> ++
> ++
>  
>  
> scala> carbon.sql("explain select 
> empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) 
> from originTable group by empno").show(200,false)
> org.apache.spark.sql.AnalysisException: expression 
> 'datamap_comp_maxsumminavg_table.`avg_attendance`' is neither present in the 
> group by, nor is it an aggregate function. Add to group by or wrap in first() 
> (or first_value) if you don't care which value you get.;;
> Aggregate [origintable_empno#2925], [origintable_empno#2925 AS empno#3002, 
> max(max_projectenddate#2926) AS max(projectenddate)#3003, 
> sum(sum_salary#2927L) AS sum(salary)#3004L, min(min_projectjoindate#2928) AS 
> min(projectjoindate)#3005, avg_attendance#2929 AS avg(attendance)#3006]
> +- SubqueryAlias datamap_comp_maxsumminavg_table
>    +- 
> Relation[origintable_empno#2925,max_projectenddate#2926,sum_salary#2927L,min_projectjoindate#2928,avg_attendance#2929]
>  CarbonDatasourceHadoopRelation [ Database name :default, Table name 
> :datamap_comp_maxsumminavg_table, Schema 
> :Some(StructType(StructField(origintable_empno,IntegerType,true), 
> StructField(max_projectenddate,TimestampType,true), 
> StructField(sum_salary,LongType,true), 
> StructField(min_projectjoindate,TimestampType,true), 
> StructField(avg_attendance,DoubleType,true))) ]
>  
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:39)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:247)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:253)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:280)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:78)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:78)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:52)
>   at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:148)
>   at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:95)
>   at 
> 

[jira] [Updated] (CARBONDATA-2576) MV Datamap - MV is not working fine if there is more than 3 aggregate function in the same datamap.

2018-06-04 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2576:
--
Description: 
MV is not working fine if there is more than 3 aggregate function in the same 
datamap. It is working fine upto 3 aggregate functions on the same MV.

Test queries:

 

scala> carbon.sql("create datamap datamap_comp_maxsumminavg using 'mv' as 
select 
empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) from 
originTable group by empno").show(200,false)

++
 
++

++

 

 

scala> carbon.sql("rebuild datamap datamap_comp_maxsumminavg").show(200,false)

++
 
++

++

 

 

scala> carbon.sql("explain select 
empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) from 
originTable group by empno").show(200,false)

org.apache.spark.sql.AnalysisException: expression 
'datamap_comp_maxsumminavg_table.`avg_attendance`' is neither present in the 
group by, nor is it an aggregate function. Add to group by or wrap in first() 
(or first_value) if you don't care which value you get.;;

Aggregate [origintable_empno#2925|#2925], [origintable_empno#2925 AS 
empno#3002, max(max_projectenddate#2926) AS max(projectenddate)#3003, 
sum(sum_salary#2927L) AS sum(salary)#3004L, min(min_projectjoindate#2928) AS 
min(projectjoindate)#3005, avg_attendance#2929 AS avg(attendance)#3006|#2925 AS 
empno#3002, max(max_projectenddate#2926) AS max(projectenddate)#3003, 
sum(sum_salary#2927L) AS sum(salary)#3004L, min(min_projectjoindate#2928) AS 
min(projectjoindate)#3005, avg_attendance#2929 AS avg(attendance)#3006]

+- SubqueryAlias datamap_comp_maxsumminavg_table

   +- 
Relation[origintable_empno#2925,max_projectenddate#2926,sum_salary#2927L,min_projectjoindate#2928,avg_attendance#2929|#2925,max_projectenddate#2926,sum_salary#2927L,min_projectjoindate#2928,avg_attendance#2929]
 CarbonDatasourceHadoopRelation [ Database name :default, Table name 
:datamap_comp_maxsumminavg_table, Schema 
:Some(StructType(StructField(origintable_empno,IntegerType,true), 
StructField(max_projectenddate,TimestampType,true), 
StructField(sum_salary,LongType,true), 
StructField(min_projectjoindate,TimestampType,true), 
StructField(avg_attendance,DoubleType,true))) ]

 

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:39)

  at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:247)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253)

  at scala.collection.immutable.List.foreach(List.scala:381)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:253)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280)

  at scala.collection.immutable.List.foreach(List.scala:381)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:280)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:78)

  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:78)

  at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91)

  at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:52)

  at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:148)

  at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:95)

  at 
org.apache.carbondata.mv.datamap.MVAnalyzerRule.apply(MVAnalyzerRule.scala:72)

  at 
org.apache.carbondata.mv.datamap.MVAnalyzerRule.apply(MVAnalyzerRule.scala:38)

  at org.apache.spark.sql.hive.CarbonAnalyzer.execute(CarbonAnalyzer.scala:46)

  at org.apache.spark.sql.hive.CarbonAnalyzer.execute(CarbonAnalyzer.scala:27)

  at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69)

  at 

[jira] [Commented] (CARBONDATA-2576) MV Datamap - MV is not working fine if there is more than 3 aggregate function in the same datamap.

2018-06-04 Thread Prasanna Ravichandran (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500225#comment-16500225
 ] 

Prasanna Ravichandran commented on CARBONDATA-2576:
---

Please find the queries for the base table creation:

CREATE TABLE originTable (empno int, empname String, designation String, doj 
Timestamp,
workgroupcategory int, workgroupcategoryname String, deptno int, deptname 
String,
projectcode int, projectjoindate Timestamp, projectenddate Timestamp,attendance 
int,
utilization int,salary int)
STORED BY 'org.apache.carbondata.format';

LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data.csv' INTO TABLE 
originTable OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
'"','timestampformat'='dd-MM-');

 Also attached the data.csv.

 

> MV Datamap - MV is not working fine if there is more than 3 aggregate 
> function in the same datamap.
> ---
>
> Key: CARBONDATA-2576
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2576
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Reporter: Prasanna Ravichandran
>Priority: Minor
>  Labels: CARBONDATA., MV, Materialistic_Views
> Attachments: From 4th aggregate function -error shown.docx
>
>
> MV is not working fine if there is more than 3 aggregate function in the same 
> datamap.
> Test queries:
>  
> scala> carbon.sql("create datamap datamap_comp_maxsumminavg using 'mv' as 
> select 
> empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) 
> from originTable group by empno").show(200,false)
> ++
> ||
> ++
> ++
>  
>  
> scala> carbon.sql("rebuild datamap datamap_comp_maxsumminavg").show(200,false)
> ++
> ||
> ++
> ++
>  
>  
> scala> carbon.sql("explain select 
> empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) 
> from originTable group by empno").show(200,false)
> org.apache.spark.sql.AnalysisException: expression 
> 'datamap_comp_maxsumminavg_table.`avg_attendance`' is neither present in the 
> group by, nor is it an aggregate function. Add to group by or wrap in first() 
> (or first_value) if you don't care which value you get.;;
> Aggregate [origintable_empno#2925], [origintable_empno#2925 AS empno#3002, 
> max(max_projectenddate#2926) AS max(projectenddate)#3003, 
> sum(sum_salary#2927L) AS sum(salary)#3004L, min(min_projectjoindate#2928) AS 
> min(projectjoindate)#3005, avg_attendance#2929 AS avg(attendance)#3006]
> +- SubqueryAlias datamap_comp_maxsumminavg_table
>    +- 
> Relation[origintable_empno#2925,max_projectenddate#2926,sum_salary#2927L,min_projectjoindate#2928,avg_attendance#2929]
>  CarbonDatasourceHadoopRelation [ Database name :default, Table name 
> :datamap_comp_maxsumminavg_table, Schema 
> :Some(StructType(StructField(origintable_empno,IntegerType,true), 
> StructField(max_projectenddate,TimestampType,true), 
> StructField(sum_salary,LongType,true), 
> StructField(min_projectjoindate,TimestampType,true), 
> StructField(avg_attendance,DoubleType,true))) ]
>  
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:39)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:247)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:253)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:280)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:78)
>   at 
> 

[jira] [Created] (CARBONDATA-2576) MV Datamap - MV is not working fine if there is more than 3 aggregate function in the same datamap.

2018-06-04 Thread Prasanna Ravichandran (JIRA)
Prasanna Ravichandran created CARBONDATA-2576:
-

 Summary: MV Datamap - MV is not working fine if there is more than 
3 aggregate function in the same datamap.
 Key: CARBONDATA-2576
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2576
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Reporter: Prasanna Ravichandran
 Attachments: From 4th aggregate function -error shown.docx

MV is not working fine if there is more than 3 aggregate function in the same 
datamap.

Test queries:

 

scala> carbon.sql("create datamap datamap_comp_maxsumminavg using 'mv' as 
select 
empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) from 
originTable group by empno").show(200,false)

++

||

++

++

 

 

scala> carbon.sql("rebuild datamap datamap_comp_maxsumminavg").show(200,false)

++

||

++

++

 

 

scala> carbon.sql("explain select 
empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) from 
originTable group by empno").show(200,false)

org.apache.spark.sql.AnalysisException: expression 
'datamap_comp_maxsumminavg_table.`avg_attendance`' is neither present in the 
group by, nor is it an aggregate function. Add to group by or wrap in first() 
(or first_value) if you don't care which value you get.;;

Aggregate [origintable_empno#2925], [origintable_empno#2925 AS empno#3002, 
max(max_projectenddate#2926) AS max(projectenddate)#3003, sum(sum_salary#2927L) 
AS sum(salary)#3004L, min(min_projectjoindate#2928) AS 
min(projectjoindate)#3005, avg_attendance#2929 AS avg(attendance)#3006]

+- SubqueryAlias datamap_comp_maxsumminavg_table

   +- 
Relation[origintable_empno#2925,max_projectenddate#2926,sum_salary#2927L,min_projectjoindate#2928,avg_attendance#2929]
 CarbonDatasourceHadoopRelation [ Database name :default, Table name 
:datamap_comp_maxsumminavg_table, Schema 
:Some(StructType(StructField(origintable_empno,IntegerType,true), 
StructField(max_projectenddate,TimestampType,true), 
StructField(sum_salary,LongType,true), 
StructField(min_projectjoindate,TimestampType,true), 
StructField(avg_attendance,DoubleType,true))) ]

 

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:39)

  at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:247)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253)

  at scala.collection.immutable.List.foreach(List.scala:381)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:253)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280)

  at scala.collection.immutable.List.foreach(List.scala:381)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:280)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:78)

  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)

  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:78)

  at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91)

  at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:52)

  at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:148)

  at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:95)

  at 
org.apache.carbondata.mv.datamap.MVAnalyzerRule.apply(MVAnalyzerRule.scala:72)

  at 
org.apache.carbondata.mv.datamap.MVAnalyzerRule.apply(MVAnalyzerRule.scala:38)

  at org.apache.spark.sql.hive.CarbonAnalyzer.execute(CarbonAnalyzer.scala:46)

  at org.apache.spark.sql.hive.CarbonAnalyzer.execute(CarbonAnalyzer.scala:27)

  at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69)

  at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67)

  at 

[jira] [Created] (CARBONDATA-2574) MV Datamap - MV is not working if there is aggregate function with group by and without any projections.

2018-06-04 Thread Prasanna Ravichandran (JIRA)
Prasanna Ravichandran created CARBONDATA-2574:
-

 Summary: MV Datamap - MV is not working if there is aggregate 
function with group by and without any projections.
 Key: CARBONDATA-2574
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2574
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
 Environment: 3 Node Opensource ANT cluster.
Reporter: Prasanna Ravichandran
 Attachments: MV_aggregate_without_projection_and_with_groupby.docx, 
data.csv

User query is not fetching data from the MV datamap, if there is aggregate 
function with group by and without any projections.

Test queries:(In Spark-shell)

 

scala> carbon.sql("CREATE TABLE originTable (empno int, empname String, 
designation String, doj Timestamp,workgroupcategory int, workgroupcategoryname 
String, deptno int, deptname String,projectcode int, projectjoindate Timestamp, 
projectenddate Timestamp,attendance int,utilization int,salary int) STORED BY 
'org.apache.carbondata.format'").show(200,false)

++

||

++

++

 

scala> carbon.sql("LOAD DATA local inpath 
'hdfs://hacluster/user/prasanna/data.csv' INTO TABLE originTable 
OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
'\"','timestampformat'='dd-MM-')").show(200,false)

++

||

++

++

 

 

scala> carbon.sql("create datamap Mv_misscol using 'mv' as select sum(salary) 
from origintable group by empno").show(200,false)

++

||

++

++

 

 

scala> carbon.sql("rebuild datamap Mv_misscol").show(200,false)

++

||

++

++

 

 

scala> carbon.sql("explain select sum(salary) from origintable group by 
empno").show(200,false)

+---+

|plan   











|

+---+

|== CarbonData Profiler ==

Table Scan on origintable

 - total blocklets: 1

 - filter: none

 - pruned by Main DataMap

    - skipped blocklets: 0

  

[jira] [Updated] (CARBONDATA-2541) MV Dataset - When MV satisfy filter condition but not exact same condition given during MV creation, then the user query is not accessing the data from MV.

2018-06-04 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2541:
--
Attachment: data.csv

> MV Dataset - When MV satisfy filter condition but not exact same condition 
> given during MV creation, then the user query is not accessing the data from 
> MV.
> ---
>
> Key: CARBONDATA-2541
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2541
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Reporter: Prasanna Ravichandran
>Priority: Minor
>  Labels: Carbondata, MV, Materialistic_Views
> Attachments: data.csv
>
>
> MV Dataset - When MV satisfy filter condition but not exact same condition 
> given during MV creation, then the user query is not accessing the data from 
> MV.
> Test queries - spark shell:
> scala>carbon.sql("CREATE TABLE originTable (empno int, empname String, 
> designation String, doj Timestamp, workgroupcategory int, 
> workgroupcategoryname String, deptno int, deptname String, projectcode int, 
> projectjoindate Timestamp, projectenddate Timestamp,attendance int, 
> utilization int,salary int) STORED BY 'org.apache.carbondata.format'").show()
> ++
> ||
> ++
> ++
>  
> scala>carbon.sql("LOAD DATA local inpath 
> 'hdfs://hacluster/user/prasanna/data.csv' INTO TABLE originTable 
> OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '\"','timestampformat'='dd-MM-')").show()
> ++
> ||
> ++
> ++
>  
> scala> carbon.sql("create datamap mv_project3 using 'mv' as select 
> projectenddate,empno from originTable where empno>10").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql(" rebuild datamap mv_project3").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql(" explain select projectenddate,empno from originTable 
> where empno>15").show(200,false)
> +-+
> |plan |
> +-+
> |== CarbonData Profiler ==
> Table Scan on origintable
>  - total blocklets: 2
>  - filter: (empno <> null and empno > 15)
>  - pruned by Main DataMap
>  - skipped blocklets: 0
>  |
> |== Physical Plan ==
> *BatchedScan CarbonDatasourceHadoopRelation [ Database name :default, Table 
> name :origintable, Schema 
> :Some(StructType(StructField(empno,IntegerType,true), 
> StructField(empname,StringType,true), 
> StructField(designation,StringType,true), 
> StructField(doj,TimestampType,true), 
> StructField(workgroupcategory,IntegerType,true), 
> StructField(workgroupcategoryname,StringType,true), 
> StructField(deptno,IntegerType,true), StructField(deptname,StringType,true), 
> StructField(projectcode,IntegerType,true), 
> StructField(projectjoindate,TimestampType,true), 
> StructField(projectenddate,TimestampType,true), 
> StructField(attendance,IntegerType,true), 
> StructField(utilization,IntegerType,true), 
> StructField(salary,IntegerType,true))) ] 
> default.origintable[projectenddate#3095,empno#3085] PushedFilters: 
> [IsNotNull(empno), GreaterThan(empno,15)]|
> 

[jira] [Updated] (CARBONDATA-2539) MV Dataset - Subqueries is not accessing the data from the MV datamap.

2018-06-04 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2539:
--
Attachment: data.csv

> MV Dataset - Subqueries is not accessing the data from the MV datamap.
> --
>
> Key: CARBONDATA-2539
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2539
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
> Environment: 3 node opensource ANT cluster.
>Reporter: Prasanna Ravichandran
>Priority: Minor
> Attachments: data.csv
>
>
> Inner subquery is not accessing the data from the MV datamap. It is accessing 
> the data from the main table.
> Test queries - Spark shell:
> scala> carbon.sql("drop table if exists origintable").show()
> ++
> ||
> ++
> ++
>  scala> carbon.sql("CREATE TABLE originTable (empno int, empname String, 
> designation String, doj Timestamp, workgroupcategory int, 
> workgroupcategoryname String, deptno int, deptname String, projectcode int, 
> projectjoindate Timestamp, projectenddate Timestamp,attendance int, 
> utilization int,salary int) STORED BY 
> 'org.apache.carbondata.format'").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("LOAD DATA local inpath 
> 'hdfs://hacluster/user/prasanna/data.csv' INTO TABLE originTable 
> OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '\"','timestampformat'='dd-MM-')").show(200,false)
> ++
> ||
> ++
> ++
>  
> scala> carbon.sql("drop datamap datamap_subqry").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("create datamap datamap_subqry using 'mv' as select 
> min(salary) from originTable group by empno").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("explain SELECT max(empno) FROM originTable WHERE salary IN 
> (select min(salary) from originTable group by empno ) group by 
> empname").show(200,false)
> ++
> |plan |
> 

[jira] [Updated] (CARBONDATA-2537) MV Dataset - User queries with 'having' condition is not accessing the data from the MV datamap.

2018-06-04 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2537:
--
Attachment: data.csv

> MV Dataset - User queries with 'having' condition is not accessing the data 
> from the MV datamap.
> 
>
> Key: CARBONDATA-2537
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2537
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
> Environment: 3 Node Opensource ANT cluster.
>Reporter: Prasanna Ravichandran
>Priority: Minor
>  Labels: Carbondata, MV, Materialistic_Views
> Attachments: data.csv, image-2018-05-25-15-50-23-903.png
>
>
> User queries with 'having' condition is not accessing the data from the MV 
> datamap. It is accessing the data from the Main table.
> Test queries - spark shell:
> scala>carbon.sql("CREATE TABLE originTable (empno int, empname String, 
> designation String, doj Timestamp, workgroupcategory int, 
> workgroupcategoryname String, deptno int, deptname String, projectcode int, 
> projectjoindate Timestamp, projectenddate Timestamp,attendance int, 
> utilization int,salary int) STORED BY 'org.apache.carbondata.format'").show()
> ++
> ||
> ++
> ++
> scala>carbon.sql("LOAD DATA local inpath 
> 'hdfs://hacluster/user/prasanna/data.csv' INTO TABLE originTable 
> OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '\"','timestampformat'='dd-MM-')").show()
> ++
> ||
> ++
> ++
> scala> carbon.sql("select empno from originTable having 
> salary>1").show(200,false)
> +-+
> |empno|
> +-+
> |14 |
> |15 |
> |20 |
> |19 |
> +-+
> scala> carbon.sql("create datamap mv_hav using 'mv' as select empno from 
> originTable having salary>1").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("explain select empno from originTable having 
> salary>1").show(200,false)
> +---+
> |plan |
> +---+
> |== CarbonData Profiler ==
> Table Scan on origintable
>  - total blocklets: 1
>  - filter: (salary <> null and salary > 1)
>  - pruned by Main DataMap
>  - skipped blocklets: 0
>  |
> |== Physical Plan ==
> *Project [empno#1131]
> +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :default, 
> Table name :origintable, Schema 
> :Some(StructType(StructField(empno,IntegerType,true), 
> StructField(empname,StringType,true), 
> StructField(designation,StringType,true), 
> StructField(doj,TimestampType,true), 
> StructField(workgroupcategory,IntegerType,true), 
> StructField(workgroupcategoryname,StringType,true), 
> StructField(deptno,IntegerType,true), StructField(deptname,StringType,true), 
> StructField(projectcode,IntegerType,true), 
> StructField(projectjoindate,TimestampType,true), 
> StructField(projectenddate,TimestampType,true), 
> StructField(attendance,IntegerType,true), 
> StructField(utilization,IntegerType,true), 
> StructField(salary,IntegerType,true))) ] default.origintable[empno#1131] 
> PushedFilters: [IsNotNull(salary), GreaterThan(salary,1)]|
> 

[jira] [Updated] (CARBONDATA-2536) MV Dataset - When user query has substring() of column under group by, which is same as the MV group by column, then the user query is not accessing the data from th

2018-06-04 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2536:
--
Attachment: data.csv

> MV Dataset - When user query has substring() of column under group by, which 
> is same as the MV group by column, then the user query is not accessing the 
> data from the MV datamap table.
> 
>
> Key: CARBONDATA-2536
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2536
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
> Environment: 3 node opensource ANT Cluster
>Reporter: Prasanna Ravichandran
>Priority: Minor
>  Labels: Carbondata, MV, Materialistic_Views
> Attachments: data.csv
>
>
> MV Dataset - When user query has substring() of column under group by, which 
> is same as the  MV group by column, then the user query is not accessing the 
> data from the MV datamap table. It is accessing the data from the main table 
> only.
> Test query:
> carbon.sql("CREATE TABLE originTable (empno int, empname String, designation 
> String, doj Timestamp, workgroupcategory int, workgroupcategoryname String, 
> deptno int, deptname String, projectcode int, projectjoindate Timestamp, 
> projectenddate Timestamp,attendance int, utilization int,salary int) STORED 
> BY 'org.apache.carbondata.format'").show()
> ++
>  
> ++
>  ++
>  
> carbon.sql("LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data.csv' 
> INTO TABLE originTable OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '\"','timestampformat'='dd-MM-')").show()
> ++
>  
> ++
>  ++
>  
> scala> carbon.sql("Create datamap m2 using 'mv' as select sum(salary) from 
> originTable group by deptname").show(200,false)
>  ++
>  
> ++
>  ++
> scala> carbon.sql("rebuild datamap m2").show(200,false)
>  ++
>  
> ++
>  ++
>  
> scala> carbon.sql("explain select sum(salary) from originTable group by 
> substring(deptname,2,2)")
>  res60: org.apache.spark.sql.DataFrame = [plan: string]
> scala> carbon.sql("explain select sum(salary) from originTable group by 
> substring(deptname,2,2)").show(200,false)
>  
> +-+
> |plan|
> +-+
> |== CarbonData Profiler ==
>  Table Scan on origintable
>  - total blocklets: 1
>  - filter: none
>  - pruned by Main DataMap
>  - skipped blocklets: 0|
> |== Physical Plan ==
>  *HashAggregate(keys=[substring(deptname#1138, 2, 2)#1255|#1138, 2, 2)#1255], 
> 

[jira] [Updated] (CARBONDATA-2534) MV Dataset - MV creation is not working with the substring()

2018-06-04 Thread Prasanna Ravichandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran updated CARBONDATA-2534:
--
Attachment: data.csv

> MV Dataset - MV creation is not working with the substring() 
> -
>
> Key: CARBONDATA-2534
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2534
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
> Environment: 3 node opensource ANT cluster
>Reporter: Prasanna Ravichandran
>Priority: Minor
>  Labels: CarbonData, MV, Materialistic_Views
> Attachments: MV_substring.docx, data.csv
>
>
> MV creation is not working with the sub string function. We are getting the 
> spark.sql.AnalysisException while trying to create a MV with the substring 
> and aggregate function. 
> *Spark -shell test queries:*
>  scala> carbon.sql("create datamap mv_substr using 'mv' as select 
> sum(salary),substring(empname,2,5),designation from originTable group by 
> substring(empname,2,5),designation").show(200,false)
> *org.apache.spark.sql.AnalysisException: Cannot create a table having a 
> column whose name contains commas in Hive metastore. Table: 
> `default`.`mv_substr_table`; Column: substring_empname,_2,_5;*
>  *at* 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:150)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:148)
>  at scala.collection.immutable.List.foreach(List.scala:381)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema(HiveExternalCatalog.scala:148)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply$mcV$sp(HiveExternalCatalog.scala:222)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.doCreateTable(HiveExternalCatalog.scala:216)
>  at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalog.createTable(ExternalCatalog.scala:110)
>  at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:316)
>  at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:119)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
>  at org.apache.spark.sql.Dataset.(Dataset.scala:183)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:108)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:97)
>  at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:155)
>  at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:95)
>  at 
> org.apache.spark.sql.execution.command.table.CarbonCreateTableCommand.processMetadata(CarbonCreateTableCommand.scala:126)
>  at 
> org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:68)
>  at 
> org.apache.carbondata.mv.datamap.MVHelper$.createMVDataMap(MVHelper.scala:103)
>  at 
> org.apache.carbondata.mv.datamap.MVDataMapProvider.initMeta(MVDataMapProvider.scala:53)
>  at 
> org.apache.spark.sql.execution.command.datamap.CarbonCreateDataMapCommand.processMetadata(CarbonCreateDataMapCommand.scala:118)
>  at 
> org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:90)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
>  at org.apache.spark.sql.Dataset.(Dataset.scala:183)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:108)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:97)
>  at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:155)
>  at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:95)
>  ... 48 elided



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CARBONDATA-2534) MV Dataset - MV creation is not working with the substring()

2018-06-04 Thread Prasanna Ravichandran (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500032#comment-16500032
 ] 

Prasanna Ravichandran commented on CARBONDATA-2534:
---

Base table queries:

CREATE TABLE originTable (empno int, empname String, designation String, doj 
Timestamp,
workgroupcategory int, workgroupcategoryname String, deptno int, deptname 
String,
projectcode int, projectjoindate Timestamp, projectenddate Timestamp,attendance 
int,
utilization int,salary int)
STORED BY 'org.apache.carbondata.format';

LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data.csv' INTO TABLE 
originTable OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
'"','timestampformat'='dd-MM-');

> MV Dataset - MV creation is not working with the substring() 
> -
>
> Key: CARBONDATA-2534
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2534
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
> Environment: 3 node opensource ANT cluster
>Reporter: Prasanna Ravichandran
>Priority: Minor
>  Labels: CarbonData, MV, Materialistic_Views
> Attachments: MV_substring.docx
>
>
> MV creation is not working with the sub string function. We are getting the 
> spark.sql.AnalysisException while trying to create a MV with the substring 
> and aggregate function. 
> *Spark -shell test queries:*
>  scala> carbon.sql("create datamap mv_substr using 'mv' as select 
> sum(salary),substring(empname,2,5),designation from originTable group by 
> substring(empname,2,5),designation").show(200,false)
> *org.apache.spark.sql.AnalysisException: Cannot create a table having a 
> column whose name contains commas in Hive metastore. Table: 
> `default`.`mv_substr_table`; Column: substring_empname,_2,_5;*
>  *at* 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:150)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:148)
>  at scala.collection.immutable.List.foreach(List.scala:381)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema(HiveExternalCatalog.scala:148)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply$mcV$sp(HiveExternalCatalog.scala:222)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.doCreateTable(HiveExternalCatalog.scala:216)
>  at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalog.createTable(ExternalCatalog.scala:110)
>  at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:316)
>  at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:119)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
>  at org.apache.spark.sql.Dataset.(Dataset.scala:183)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:108)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:97)
>  at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:155)
>  at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:95)
>  at 
> org.apache.spark.sql.execution.command.table.CarbonCreateTableCommand.processMetadata(CarbonCreateTableCommand.scala:126)
>  at 
> org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:68)
>  at 
> org.apache.carbondata.mv.datamap.MVHelper$.createMVDataMap(MVHelper.scala:103)
>  at 
> org.apache.carbondata.mv.datamap.MVDataMapProvider.initMeta(MVDataMapProvider.scala:53)
>  at 
> org.apache.spark.sql.execution.command.datamap.CarbonCreateDataMapCommand.processMetadata(CarbonCreateDataMapCommand.scala:118)
>  at 
> org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:90)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>  at 
> 

  1   2   >