[jira] [Updated] (CARBONDATA-4317) TPCDS perf issues

2021-12-09 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-4317:
-
Description: 
h3.  

The following issues has degraded the TPCDS query performance
 # If dynamic filters is not present in partitionFilters Set, then that filter 
is skipped, to pushdown to spark.
 # In some cases, some nodes like Exchange / Shuffle is not reused, because the 
CarbonDataSourceSCan plan is not mached
 # While accessing the metadata on the canonicalized plan throws NPE

> TPCDS perf issues
> -
>
> Key: CARBONDATA-4317
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4317
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>
> h3.  
> The following issues has degraded the TPCDS query performance
>  # If dynamic filters is not present in partitionFilters Set, then that 
> filter is skipped, to pushdown to spark.
>  # In some cases, some nodes like Exchange / Shuffle is not reused, because 
> the CarbonDataSourceSCan plan is not mached
>  # While accessing the metadata on the canonicalized plan throws NPE



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (CARBONDATA-4317) TPCDS perf issues

2021-12-09 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-4317:


 Summary: TPCDS perf issues
 Key: CARBONDATA-4317
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4317
 Project: CarbonData
  Issue Type: Improvement
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (CARBONDATA-4293) Table without External keyword is created as external table in local mode

2021-09-29 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-4293:


 Summary: Table without External keyword is created as external 
table in local mode
 Key: CARBONDATA-4293
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4293
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR

2021-09-22 Thread Indhumathi Muthumurugesh (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418713#comment-17418713
 ] 

Indhumathi Muthumurugesh commented on CARBONDATA-4279:
--

Thanks for the clarification.

So, now the initial issue ( as mentioned in description) is Without Location 
keyword.

Please check the following.
 # If any exception occurred during insert ?
 # If scenario works fine with non-partition table ?

> Insert data to table with a partitions resulting in 'Marked for Delete' 
> segment in Spark in EMR
> ---
>
> Key: CARBONDATA-4279
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4279
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.3.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.3.0-SNAPSHOT
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Blocker
>
> as described [here|https://github.com/apache/carbondata/issues/4212]
> After the commit 
> [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7]
> I have successfully created a table with partitions, but when I trying insert 
> data the job end with a success
>  but the segment is marked as "Marked for Delete"
> I am running:
> {code:sql}
> CREATE TABLE lior_carbon_tests.mark_for_del_bug(
> timestamp string,
> name string
> )
> STORED AS carbondata
> PARTITIONED BY (dt string, hr string)
> {code}
> {code:sql}
> INSERT INTO lior_carbon_tests.mark_for_del_bug select 
> '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
> {code}
> {code:sql}
> select * from lior_carbon_tests.mark_for_del_bug
> {code}
> gives:
> {code:java}
> +-++---+---+
> |timestamp|name| dt| hr|
> +-++---+---+
> +-++---+---+
> {code}
> And
> {code:java}
> show segments for TABLE lior_carbon_tests.mark_for_del_bug
> {code}
> gives
>  
> {code:java}
> +---+-+---+---+-+-+--+---+
> |ID |Status   |Load Start Time|Load Time Taken|Partition|Data 
> Size|Index Size|File Format|
> +---+-+---+---+-+-+--+---+
> |0  |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA   |NA   
> |NA|columnar_v3|
> +---+-+---+---+-+-+--+---+
> {code}
>  
> I took a looking at the folder structure in S3 and it seems fine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR

2021-09-22 Thread Indhumathi Muthumurugesh (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418667#comment-17418667
 ] 

Indhumathi Muthumurugesh edited comment on CARBONDATA-4279 at 9/22/21, 3:41 PM:


Hi.. the steps that you have mentioned in description and in comment is 
different.
 # If database is created with "{color:#8c8c8c}CREATE DATABASE 
lior_carbon_tests location '/home/root1/newdb'{color}" ?
 # If table is created with Location keyword or not ?
 # Number of columns that you are trying to insert matches the number of 
columns in table ( as we can see from describe command). can you share the 
exception stack trace for insert failure
 # If table and database is not created with specifying LOCATION, the issue 
still occurs or not?


was (Author: indhumathi27):
Hi.. the steps that you have mentioned in description and in comment is 
different.
 # If database is created with "{color:#8c8c8c}CREATE DATABASE 
lior_carbon_tests location '/home/root1/newdb'{color}" ?
 # If table is created with Location keyword or not ?
 # Number of columns that you are trying to insert matches the number of 
columns in table ( as we can see from describe command). can you share the 
exception stack trace for insert failure

> Insert data to table with a partitions resulting in 'Marked for Delete' 
> segment in Spark in EMR
> ---
>
> Key: CARBONDATA-4279
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4279
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.3.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.3.0-SNAPSHOT
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Blocker
>
> as described [here|https://github.com/apache/carbondata/issues/4212]
> After the commit 
> [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7]
> I have successfully created a table with partitions, but when I trying insert 
> data the job end with a success
>  but the segment is marked as "Marked for Delete"
> I am running:
> {code:sql}
> CREATE TABLE lior_carbon_tests.mark_for_del_bug(
> timestamp string,
> name string
> )
> STORED AS carbondata
> PARTITIONED BY (dt string, hr string)
> {code}
> {code:sql}
> INSERT INTO lior_carbon_tests.mark_for_del_bug select 
> '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
> {code}
> {code:sql}
> select * from lior_carbon_tests.mark_for_del_bug
> {code}
> gives:
> {code:java}
> +-++---+---+
> |timestamp|name| dt| hr|
> +-++---+---+
> +-++---+---+
> {code}
> And
> {code:java}
> show segments for TABLE lior_carbon_tests.mark_for_del_bug
> {code}
> gives
>  
> {code:java}
> +---+-+---+---+-+-+--+---+
> |ID |Status   |Load Start Time|Load Time Taken|Partition|Data 
> Size|Index Size|File Format|
> +---+-+---+---+-+-+--+---+
> |0  |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA   |NA   
> |NA|columnar_v3|
> +---+-+---+---+-+-+--+---+
> {code}
>  
> I took a looking at the folder structure in S3 and it seems fine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR

2021-09-22 Thread Indhumathi Muthumurugesh (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418667#comment-17418667
 ] 

Indhumathi Muthumurugesh edited comment on CARBONDATA-4279 at 9/22/21, 3:40 PM:


Hi.. the steps that you have mentioned in description and in comment is 
different.
 # If database is created with "{color:#8c8c8c}CREATE DATABASE 
lior_carbon_tests location '/home/root1/newdb'{color}" ?
 # If table is created with Location keyword or not ?
 # Number of columns that you are trying to insert matches the number of 
columns in table ( as we can see from describe command). can you share the 
exception stack trace for insert failure


was (Author: indhumathi27):
Hi.. the steps that you have mentioned in description and in comment is 
different.

> Insert data to table with a partitions resulting in 'Marked for Delete' 
> segment in Spark in EMR
> ---
>
> Key: CARBONDATA-4279
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4279
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.3.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.3.0-SNAPSHOT
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Blocker
>
> as described [here|https://github.com/apache/carbondata/issues/4212]
> After the commit 
> [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7]
> I have successfully created a table with partitions, but when I trying insert 
> data the job end with a success
>  but the segment is marked as "Marked for Delete"
> I am running:
> {code:sql}
> CREATE TABLE lior_carbon_tests.mark_for_del_bug(
> timestamp string,
> name string
> )
> STORED AS carbondata
> PARTITIONED BY (dt string, hr string)
> {code}
> {code:sql}
> INSERT INTO lior_carbon_tests.mark_for_del_bug select 
> '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
> {code}
> {code:sql}
> select * from lior_carbon_tests.mark_for_del_bug
> {code}
> gives:
> {code:java}
> +-++---+---+
> |timestamp|name| dt| hr|
> +-++---+---+
> +-++---+---+
> {code}
> And
> {code:java}
> show segments for TABLE lior_carbon_tests.mark_for_del_bug
> {code}
> gives
>  
> {code:java}
> +---+-+---+---+-+-+--+---+
> |ID |Status   |Load Start Time|Load Time Taken|Partition|Data 
> Size|Index Size|File Format|
> +---+-+---+---+-+-+--+---+
> |0  |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA   |NA   
> |NA|columnar_v3|
> +---+-+---+---+-+-+--+---+
> {code}
>  
> I took a looking at the folder structure in S3 and it seems fine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR

2021-09-22 Thread Indhumathi Muthumurugesh (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418667#comment-17418667
 ] 

Indhumathi Muthumurugesh commented on CARBONDATA-4279:
--

Hi.. the steps that you have mentioned in description and in comment is 
different.

> Insert data to table with a partitions resulting in 'Marked for Delete' 
> segment in Spark in EMR
> ---
>
> Key: CARBONDATA-4279
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4279
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.3.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.3.0-SNAPSHOT
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Blocker
>
> as described [here|https://github.com/apache/carbondata/issues/4212]
> After the commit 
> [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7]
> I have successfully created a table with partitions, but when I trying insert 
> data the job end with a success
>  but the segment is marked as "Marked for Delete"
> I am running:
> {code:sql}
> CREATE TABLE lior_carbon_tests.mark_for_del_bug(
> timestamp string,
> name string
> )
> STORED AS carbondata
> PARTITIONED BY (dt string, hr string)
> {code}
> {code:sql}
> INSERT INTO lior_carbon_tests.mark_for_del_bug select 
> '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
> {code}
> {code:sql}
> select * from lior_carbon_tests.mark_for_del_bug
> {code}
> gives:
> {code:java}
> +-++---+---+
> |timestamp|name| dt| hr|
> +-++---+---+
> +-++---+---+
> {code}
> And
> {code:java}
> show segments for TABLE lior_carbon_tests.mark_for_del_bug
> {code}
> gives
>  
> {code:java}
> +---+-+---+---+-+-+--+---+
> |ID |Status   |Load Start Time|Load Time Taken|Partition|Data 
> Size|Index Size|File Format|
> +---+-+---+---+-+-+--+---+
> |0  |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA   |NA   
> |NA|columnar_v3|
> +---+-+---+---+-+-+--+---+
> {code}
>  
> I took a looking at the folder structure in S3 and it seems fine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR

2021-09-22 Thread Indhumathi Muthumurugesh (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418560#comment-17418560
 ] 

Indhumathi Muthumurugesh commented on CARBONDATA-4279:
--

can you please share describe formatted table results

> Insert data to table with a partitions resulting in 'Marked for Delete' 
> segment in Spark in EMR
> ---
>
> Key: CARBONDATA-4279
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4279
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.3.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.3.0-SNAPSHOT
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Blocker
>
> as described [here|https://github.com/apache/carbondata/issues/4212]
> After the commit 
> [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7]
> I have successfully created a table with partitions, but when I trying insert 
> data the job end with a success
>  but the segment is marked as "Marked for Delete"
> I am running:
> {code:sql}
> CREATE TABLE lior_carbon_tests.mark_for_del_bug(
> timestamp string,
> name string
> )
> STORED AS carbondata
> PARTITIONED BY (dt string, hr string)
> {code}
> {code:sql}
> INSERT INTO lior_carbon_tests.mark_for_del_bug select 
> '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
> {code}
> {code:sql}
> select * from lior_carbon_tests.mark_for_del_bug
> {code}
> gives:
> {code:java}
> +-++---+---+
> |timestamp|name| dt| hr|
> +-++---+---+
> +-++---+---+
> {code}
> And
> {code:java}
> show segments for TABLE lior_carbon_tests.mark_for_del_bug
> {code}
> gives
>  
> {code:java}
> +---+-+---+---+-+-+--+---+
> |ID |Status   |Load Start Time|Load Time Taken|Partition|Data 
> Size|Index Size|File Format|
> +---+-+---+---+-+-+--+---+
> |0  |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA   |NA   
> |NA|columnar_v3|
> +---+-+---+---+-+-+--+---+
> {code}
>  
> I took a looking at the folder structure in S3 and it seems fine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR

2021-09-22 Thread Indhumathi Muthumurugesh (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418478#comment-17418478
 ] 

Indhumathi Muthumurugesh commented on CARBONDATA-4279:
--

create table with location doesnt work in your cluster ?

> Insert data to table with a partitions resulting in 'Marked for Delete' 
> segment in Spark in EMR
> ---
>
> Key: CARBONDATA-4279
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4279
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.3.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.3.0-SNAPSHOT
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Blocker
>
> as decribed [here|https://github.com/apache/carbondata/issues/4212]
> After the commit 
> [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7]
> I have successfully created a table with partitions, but when I trying insert 
> data the job end with a success
>  but the segment is marked as "Marked for Delete"
> I am running:
> {code:sql}
> CREATE TABLE lior_carbon_tests.mark_for_del_bug(
> timestamp string,
> name string
> )
> STORED AS carbondata
> PARTITIONED BY (dt string, hr string)
> {code}
> {code:sql}
> INSERT INTO lior_carbon_tests.mark_for_del_bug select 
> '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
> {code}
> {code:sql}
> select * from lior_carbon_tests.mark_for_del_bug
> {code}
> gives:
> {code:java}
> +-++---+---+
> |timestamp|name| dt| hr|
> +-++---+---+
> +-++---+---+
> {code}
> And
> {code:java}
> show segments for TABLE lior_carbon_tests.mark_for_del_bug
> {code}
> gives
>  
> {code:java}
> +---+-+---+---+-+-+--+---+
> |ID |Status   |Load Start Time|Load Time Taken|Partition|Data 
> Size|Index Size|File Format|
> +---+-+---+---+-+-+--+---+
> |0  |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA   |NA   
> |NA|columnar_v3|
> +---+-+---+---+-+-+--+---+
> {code}
>  
> I took a looking at the folder structure in S3 and it seems fine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR

2021-09-20 Thread Indhumathi Muthumurugesh (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417549#comment-17417549
 ] 

Indhumathi Muthumurugesh edited comment on CARBONDATA-4279 at 9/20/21, 9:56 AM:


Hi, I have the following question for this Jira

=> If the table is created with `LOCATION '' ` or not

And this 
commit([https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7])
 has test case already added.  Looks it works fine.

Can you observe if there is any Exception during insert


was (Author: indhumathi27):
Hi, I have the following question for this Jira

=> If the table is created with `LOCATION '' ` or not

> Insert data to table with a partitions resulting in 'Marked for Delete' 
> segment in Spark in EMR
> ---
>
> Key: CARBONDATA-4279
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4279
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.3.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.3.0-SNAPSHOT
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Blocker
>
> as decribed [here|https://github.com/apache/carbondata/issues/4212]
> After the commit 
> [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7]
> I have successfully created a table with partitions, but when I trying insert 
> data the job end with a success
>  but the segment is marked as "Marked for Delete"
> I am running:
> {code:sql}
> CREATE TABLE lior_carbon_tests.mark_for_del_bug(
> timestamp string,
> name string
> )
> STORED AS carbondata
> PARTITIONED BY (dt string, hr string)
> {code}
> {code:sql}
> INSERT INTO lior_carbon_tests.mark_for_del_bug select 
> '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
> {code}
> {code:sql}
> select * from lior_carbon_tests.mark_for_del_bug
> {code}
> gives:
> {code:java}
> +-++---+---+
> |timestamp|name| dt| hr|
> +-++---+---+
> +-++---+---+
> {code}
> And
> {code:java}
> show segments for TABLE lior_carbon_tests.mark_for_del_bug
> {code}
> gives
>  
> {code:java}
> +---+-+---+---+-+-+--+---+
> |ID |Status   |Load Start Time|Load Time Taken|Partition|Data 
> Size|Index Size|File Format|
> +---+-+---+---+-+-+--+---+
> |0  |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA   |NA   
> |NA|columnar_v3|
> +---+-+---+---+-+-+--+---+
> {code}
>  
> I took a looking at the folder structure in S3 and it seems fine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR

2021-09-20 Thread Indhumathi Muthumurugesh (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417549#comment-17417549
 ] 

Indhumathi Muthumurugesh edited comment on CARBONDATA-4279 at 9/20/21, 9:53 AM:


Hi, I have the following question for this Jira

=> If the table is created with `LOCATION '' ` or not


was (Author: indhumathi27):
Hi, I have the following question for this Jira

=> If the table is created is created with `LOCATION '' ` or not

> Insert data to table with a partitions resulting in 'Marked for Delete' 
> segment in Spark in EMR
> ---
>
> Key: CARBONDATA-4279
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4279
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.3.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.3.0-SNAPSHOT
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Blocker
>
> as decribed [here|https://github.com/apache/carbondata/issues/4212]
> After the commit 
> [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7]
> I have successfully created a table with partitions, but when I trying insert 
> data the job end with a success
>  but the segment is marked as "Marked for Delete"
> I am running:
> {code:sql}
> CREATE TABLE lior_carbon_tests.mark_for_del_bug(
> timestamp string,
> name string
> )
> STORED AS carbondata
> PARTITIONED BY (dt string, hr string)
> {code}
> {code:sql}
> INSERT INTO lior_carbon_tests.mark_for_del_bug select 
> '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
> {code}
> {code:sql}
> select * from lior_carbon_tests.mark_for_del_bug
> {code}
> gives:
> {code:java}
> +-++---+---+
> |timestamp|name| dt| hr|
> +-++---+---+
> +-++---+---+
> {code}
> And
> {code:java}
> show segments for TABLE lior_carbon_tests.mark_for_del_bug
> {code}
> gives
>  
> {code:java}
> +---+-+---+---+-+-+--+---+
> |ID |Status   |Load Start Time|Load Time Taken|Partition|Data 
> Size|Index Size|File Format|
> +---+-+---+---+-+-+--+---+
> |0  |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA   |NA   
> |NA|columnar_v3|
> +---+-+---+---+-+-+--+---+
> {code}
>  
> I took a looking at the folder structure in S3 and it seems fine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR

2021-09-20 Thread Indhumathi Muthumurugesh (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417549#comment-17417549
 ] 

Indhumathi Muthumurugesh edited comment on CARBONDATA-4279 at 9/20/21, 9:53 AM:


Hi, I have the following question for this Jira

=> If the table is created is created with `LOCATION '' ` or not


was (Author: indhumathi27):
Hi, I have the following questions for this JIRA
 # If the table is created is created with `LOCATION '' ` or not

> Insert data to table with a partitions resulting in 'Marked for Delete' 
> segment in Spark in EMR
> ---
>
> Key: CARBONDATA-4279
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4279
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.3.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.3.0-SNAPSHOT
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Blocker
>
> as decribed [here|https://github.com/apache/carbondata/issues/4212]
> After the commit 
> [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7]
> I have successfully created a table with partitions, but when I trying insert 
> data the job end with a success
>  but the segment is marked as "Marked for Delete"
> I am running:
> {code:sql}
> CREATE TABLE lior_carbon_tests.mark_for_del_bug(
> timestamp string,
> name string
> )
> STORED AS carbondata
> PARTITIONED BY (dt string, hr string)
> {code}
> {code:sql}
> INSERT INTO lior_carbon_tests.mark_for_del_bug select 
> '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
> {code}
> {code:sql}
> select * from lior_carbon_tests.mark_for_del_bug
> {code}
> gives:
> {code:java}
> +-++---+---+
> |timestamp|name| dt| hr|
> +-++---+---+
> +-++---+---+
> {code}
> And
> {code:java}
> show segments for TABLE lior_carbon_tests.mark_for_del_bug
> {code}
> gives
>  
> {code:java}
> +---+-+---+---+-+-+--+---+
> |ID |Status   |Load Start Time|Load Time Taken|Partition|Data 
> Size|Index Size|File Format|
> +---+-+---+---+-+-+--+---+
> |0  |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA   |NA   
> |NA|columnar_v3|
> +---+-+---+---+-+-+--+---+
> {code}
>  
> I took a looking at the folder structure in S3 and it seems fine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4279) Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR

2021-09-20 Thread Indhumathi Muthumurugesh (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417549#comment-17417549
 ] 

Indhumathi Muthumurugesh commented on CARBONDATA-4279:
--

Hi, I have the following questions for this JIRA
 # If the table is created is created with `LOCATION '' ` or not

> Insert data to table with a partitions resulting in 'Marked for Delete' 
> segment in Spark in EMR
> ---
>
> Key: CARBONDATA-4279
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4279
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.3.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.3.0-SNAPSHOT
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Blocker
>
> as decribed [here|https://github.com/apache/carbondata/issues/4212]
> After the commit 
> [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7]
> I have successfully created a table with partitions, but when I trying insert 
> data the job end with a success
>  but the segment is marked as "Marked for Delete"
> I am running:
> {code:sql}
> CREATE TABLE lior_carbon_tests.mark_for_del_bug(
> timestamp string,
> name string
> )
> STORED AS carbondata
> PARTITIONED BY (dt string, hr string)
> {code}
> {code:sql}
> INSERT INTO lior_carbon_tests.mark_for_del_bug select 
> '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13'
> {code}
> {code:sql}
> select * from lior_carbon_tests.mark_for_del_bug
> {code}
> gives:
> {code:java}
> +-++---+---+
> |timestamp|name| dt| hr|
> +-++---+---+
> +-++---+---+
> {code}
> And
> {code:java}
> show segments for TABLE lior_carbon_tests.mark_for_del_bug
> {code}
> gives
>  
> {code:java}
> +---+-+---+---+-+-+--+---+
> |ID |Status   |Load Start Time|Load Time Taken|Partition|Data 
> Size|Index Size|File Format|
> +---+-+---+---+-+-+--+---+
> |0  |Marked for Delete|2021-09-02 15:24:21.022|11.798S|NA   |NA   
> |NA|columnar_v3|
> +---+-+---+---+-+-+--+---+
> {code}
>  
> I took a looking at the folder structure in S3 and it seems fine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (CARBONDATA-4273) Cannot create table with partitions in Spark in EMR

2021-08-31 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-4273:
-
Comment: was deleted

(was: is the location is empty (or) it has some partition folder which holds 
carbon data and index files ?)

> Cannot create table with partitions in Spark in EMR
> ---
>
> Key: CARBONDATA-4273
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4273
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 2.2.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hive 2.3.4, Pig 0.17.0, Hue 4.4.0, Flink 1.8.0, Spark 2.4.2, Presto 0.219, 
> JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.2.0
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Critical
>  Labels: EMR, spark
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
>  
> When trying to create a table like this:
> {code:sql}
> CREATE TABLE IF NOT EXISTS will_not_work(
> timestamp string,
> name string
> )
> PARTITIONED BY (dt string, hr string)
> STORED AS carbondata
> LOCATION 's3a://my-bucket/CarbonDataTests/will_not_work
> {code}
> I get the following error:
> {noformat}
> org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: 
> Partition is not supported for external table
>   at 
> org.apache.spark.sql.parser.CarbonSparkSqlParserUtil$.buildTableInfoFromCatalogTable(CarbonSparkSqlParserUtil.scala:219)
>   at 
> org.apache.spark.sql.CarbonSource$.createTableInfo(CarbonSource.scala:235)
>   at 
> org.apache.spark.sql.CarbonSource$.createTableMeta(CarbonSource.scala:394)
>   at 
> org.apache.spark.sql.execution.command.table.CarbonCreateDataSourceTableCommand.processMetadata(CarbonCreateDataSourceTableCommand.scala:69)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.Auditable$class.runWithAudit(package.scala:118)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand.runWithAudit(package.scala:134)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364)
>   at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363)
>   at org.apache.spark.sql.Dataset.(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
>   ... 64 elided
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4273) Cannot create table with partitions in Spark in EMR

2021-08-31 Thread Indhumathi Muthumurugesh (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407306#comment-17407306
 ] 

Indhumathi Muthumurugesh commented on CARBONDATA-4273:
--

is the location is empty (or) it has some partition folder which holds carbon 
data and index files ?

> Cannot create table with partitions in Spark in EMR
> ---
>
> Key: CARBONDATA-4273
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4273
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 2.2.0
> Environment: Release label:emr-5.24.1
> Hadoop distribution:Amazon 2.8.5
> Applications:
> Hive 2.3.4, Pig 0.17.0, Hue 4.4.0, Flink 1.8.0, Spark 2.4.2, Presto 0.219, 
> JupyterHub 0.9.6
> Jar complied with:
> apache-carbondata:2.2.0
> spark:2.4.5
> hadoop:2.8.3
>Reporter: Bigicecream
>Priority: Critical
>  Labels: EMR, spark
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
>  
> When trying to create a table like this:
> {code:sql}
> CREATE TABLE IF NOT EXISTS will_not_work(
> timestamp string,
> name string
> )
> PARTITIONED BY (dt string, hr string)
> STORED AS carbondata
> LOCATION 's3a://my-bucket/CarbonDataTests/will_not_work
> {code}
> I get the following error:
> {noformat}
> org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException: 
> Partition is not supported for external table
>   at 
> org.apache.spark.sql.parser.CarbonSparkSqlParserUtil$.buildTableInfoFromCatalogTable(CarbonSparkSqlParserUtil.scala:219)
>   at 
> org.apache.spark.sql.CarbonSource$.createTableInfo(CarbonSource.scala:235)
>   at 
> org.apache.spark.sql.CarbonSource$.createTableMeta(CarbonSource.scala:394)
>   at 
> org.apache.spark.sql.execution.command.table.CarbonCreateDataSourceTableCommand.processMetadata(CarbonCreateDataSourceTableCommand.scala:69)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand$$anonfun$run$1.apply(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.Auditable$class.runWithAudit(package.scala:118)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand.runWithAudit(package.scala:134)
>   at 
> org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:137)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364)
>   at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363)
>   at org.apache.spark.sql.Dataset.(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
>   ... 64 elided
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4239) Carbondata 2.1.1 MV : Incremental refresh : Doesnot aggregate data correctly

2021-07-14 Thread Indhumathi Muthumurugesh (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380485#comment-17380485
 ] 

Indhumathi Muthumurugesh commented on CARBONDATA-4239:
--

Hi suyash,

Incremental dataloading concept in mv, will aggreagate the new incoming data 
(new Load/Insert) and write it to a new segment. It will not append to existing 
segment.

Full Refresh mode -> Will do Aggreagtion on table data (all segments) , ie, 
insert overwrite operation, whereas, incremental refresh will create new 
segment for new incoming data.

So, in INSERT case, number of rows will be same as parent table. And, When you 
do select * from mv_table, the data is partially-aggregated.

When the query (that you have created as MV) is fired, it will do aggregation 
on this partially-aggregated data and return results.

So, in your case, this is not an issue. For INSERT CASE, if you dont want to 
load to MV for each row, you can create MV with "

with deferred refresh" and refresh it when required.

Please have a look at the design document link below, for more understanding.

[https://docs.google.com/document/d/1AACOYmBpwwNdHjJLOub0utSc6JCBMZn8VL5CvZ9hygA/edit]

 

> Carbondata 2.1.1 MV : Incremental refresh : Doesnot aggregate data correctly 
> -
>
> Key: CARBONDATA-4239
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4239
> Project: CarbonData
>  Issue Type: Bug
>  Components: core, data-load
>Affects Versions: 2.1.1
> Environment: RHEL  spark-2.4.5-bin-hadoop2.7 for carbon 2.1.1 
>Reporter: Sushant Sammanwar
>Priority: Major
>  Labels: Materialistic_Views, materializedviews, refreshnodes
>
> Hi Team ,
> We are doing a POC with Carbondata using MV .
> Our MV doesnot contain AVG function as we wanted to utilize the feature of 
> incremental refresh.
> But with incremetnal refresh , we noticed the MV doesnot aggregate value 
> correctly.
> If a row is inserted , it creates another row in MV instead of adding 
> incremental value .
> As a result no. of rows in MV are almost same as raw table.
> This doesnot happen with full refresh MV. 
> Below is the data in MV with 3 rows :
> scala> carbon.sql("select * from fact_365_1_eutrancell_21_30_minute").show()
> ++---+---+--+-+-++
> |fact_365_1_eutrancell_21_tags_id|fact_365_1_eutrancell_21_metric| ts| 
> sum_value|min_value|max_value|fact_365_1_eutrancell_21_ts2|
> ++---+---+--+-+-++
> | ff6cb0f7-fba0-413...| eUtranCell.HHO.X2...|2020-09-25 
> 06:30:00|5412.68105| 31.345| 4578.112| 2020-09-25 05:30:00|
> | ff6cb0f7-fba0-413...| eUtranCell.HHO.X2...|2020-09-25 05:30:00| 1176.7035| 
> 392.2345| 392.2345| 2020-09-25 05:30:00|
> | ff6cb0f7-fba0-413...| eUtranCell.HHO.X2...|2020-09-25 06:00:00| 58.112| 
> 58.112| 58.112| 2020-09-25 05:30:00|
> ++---+---+--+-+-++
> Below , i am inserting data for 6th hour, and it should add incremental 
> values to 6th hour row of MV. 
> Note the data being inserted ; columns which are part of groupby clause are 
> having same values as existing data.
> scala> carbon.sql("insert into fact_365_1_eutrancell_21 values ('2020-09-25 
> 06:05:00','eUtranCell.HHO.X2.InterFreq.PrepAttOut','ff6cb0f7-fba0-4134-81ee-55e820574627',118.112,'2020-09-25
>  05:30:00')").show()
> 21/06/28 16:01:31 AUDIT audit: \{"time":"June 28, 2021 4:01:31 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"7332282307468267","opStatus":"START"}
> 21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:33 AUDIT audit: \{"time":"June 28, 2021 4:01:33 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"7332284066443156","opStatus":"START"}
> [Stage 40:=>(199 + 1) / 
> 200]21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row 
> batch one more time.
> 21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:44 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
> 21/06/28 16:01:44 AUDIT audit: \{"time":"June 28, 2021 4:01:44 PM 
> IST","username":"root","opName":"INSERT 
> 

[jira] [Updated] (CARBONDATA-4183) Local sort Partition Load and Compaction improvement

2021-05-12 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-4183:
-
Description: Currently, number of tasks for partition table local sort 
load, is decided based on input file size. In this case, the data will not be 
properly sorted, as tasks launched is more. For compaction, number of tasks is 
equal to number of partitions. If data is huge for a partition, then there can 
be chances, that compaction will fail with OOM with less memory configurations.

> Local sort Partition Load and Compaction improvement
> 
>
> Key: CARBONDATA-4183
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4183
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>
> Currently, number of tasks for partition table local sort load, is decided 
> based on input file size. In this case, the data will not be properly sorted, 
> as tasks launched is more. For compaction, number of tasks is equal to number 
> of partitions. If data is huge for a partition, then there can be chances, 
> that compaction will fail with OOM with less memory configurations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4183) Local sort Partition Load and Compaction improvement

2021-05-12 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-4183:


 Summary: Local sort Partition Load and Compaction improvement
 Key: CARBONDATA-4183
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4183
 Project: CarbonData
  Issue Type: Improvement
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4166) GeoSpatial Query Enhancements

2021-04-27 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-4166:
-
Description: 
Design document link:

[https://docs.google.com/document/d/1YmTbHa0P39P8iURb_IZErIdZzx7jFgZ5fw6E2BdHjKQ/edit?usp=sharing]

 

[^Geo-spatial Enhancements_v1.pdf]

 

^Version 2:^

[https://docs.google.com/document/d/19AQj90Rll9iXBVcCMpcuKanwgDE6wuiGMYYIJzUuktk/edit?usp=sharing]

[^Geo-spatial Enhancements_v2.pdf]

  was:
Design document link:

[https://docs.google.com/document/d/1YmTbHa0P39P8iURb_IZErIdZzx7jFgZ5fw6E2BdHjKQ/edit?usp=sharing]

 

[^Geo-spatial Enhancements_v1.pdf]


> GeoSpatial Query Enhancements
> -
>
> Key: CARBONDATA-4166
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4166
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
> Attachments: Geo-spatial Enhancements_v1.pdf, Geo-spatial 
> Enhancements_v2.pdf
>
>
> Design document link:
> [https://docs.google.com/document/d/1YmTbHa0P39P8iURb_IZErIdZzx7jFgZ5fw6E2BdHjKQ/edit?usp=sharing]
>  
> [^Geo-spatial Enhancements_v1.pdf]
>  
> ^Version 2:^
> [https://docs.google.com/document/d/19AQj90Rll9iXBVcCMpcuKanwgDE6wuiGMYYIJzUuktk/edit?usp=sharing]
> [^Geo-spatial Enhancements_v2.pdf]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4166) GeoSpatial Query Enhancements

2021-04-27 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-4166:
-
Attachment: Geo-spatial Enhancements_v2.pdf

> GeoSpatial Query Enhancements
> -
>
> Key: CARBONDATA-4166
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4166
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
> Attachments: Geo-spatial Enhancements_v1.pdf, Geo-spatial 
> Enhancements_v2.pdf
>
>
> Design document link:
> [https://docs.google.com/document/d/1YmTbHa0P39P8iURb_IZErIdZzx7jFgZ5fw6E2BdHjKQ/edit?usp=sharing]
>  
> [^Geo-spatial Enhancements_v1.pdf]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4172) Select query having parent and child struct column in projection returns incorrect results

2021-04-22 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-4172:


 Summary: Select query having  parent and child struct column in 
projection returns incorrect results
 Key: CARBONDATA-4172
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4172
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh


struct column: col1 struct

insert: named_struct('a',1,'b',2,'c','a')

Query : select col1,col1.a from table;

Result:

col1 col1.a

{a:1,b:null,c:null}  1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4166) GeoSpatial Query Enhancements

2021-04-15 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-4166:
-
Description: 
Design document link:

https://docs.google.com/document/d/1YmTbHa0P39P8iURb_IZErIdZzx7jFgZ5fw6E2BdHjKQ/edit?usp=sharing

> GeoSpatial Query Enhancements
> -
>
> Key: CARBONDATA-4166
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4166
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>
> Design document link:
> https://docs.google.com/document/d/1YmTbHa0P39P8iURb_IZErIdZzx7jFgZ5fw6E2BdHjKQ/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4166) GeoSpatial Query Enhancements

2021-04-15 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-4166:


 Summary: GeoSpatial Query Enhancements
 Key: CARBONDATA-4166
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4166
 Project: CarbonData
  Issue Type: Improvement
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4147) Carbondata 2.1.0 MV ERROR inserting data into table with MV

2021-03-24 Thread Indhumathi Muthumurugesh (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307584#comment-17307584
 ] 

Indhumathi Muthumurugesh commented on CARBONDATA-4147:
--

1. If you have installed carbon by taking release jars from 
[https://carbondata.apache.org/]  , you have to wait for carbon 2.1.1 release. 
Issue fix will be given in this release.

2. If you have local Carbon Code setup, you can cherry-pick 
[https://github.com/apache/carbondata/pull/4106] and build carbon jars. In 
order to build carbon jars, you can refer 
[https://github.com/apache/carbondata/blob/master/build/README.md] 

> Carbondata 2.1.0 MV  ERROR inserting data into table with MV
> 
>
> Key: CARBONDATA-4147
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4147
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0
> Environment: Apache carbondata 2.1.0
>Reporter: Sushant Sammanwar
>Assignee: Indhumathi Muthumurugesh
>Priority: Major
>  Labels: datatype,double, materializedviews
> Fix For: 2.1.1
>
> Attachments: carbondata_210_insert_error_stack-trace
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Hi Team ,
>  
> We are working on a POC where we are using carbon 2.1.0.
> We have created below tables,  MV :
> create table if not exists fact_365_1_eutrancell_21 (ts timestamp, metric 
> STRING, tags_id STRING, value DOUBLE) partitioned by (ts2 timestamp) stored 
> as carbondata TBLPROPERTIES ('SORT_COLUMNS'='metric')
> create materialized view if not exists fact_365_1_eutrancell_21_30_minute as 
> select tags_id ,metric ,ts2, timeseries(ts,'thirty_minute') as 
> ts,sum(value),avg(value),min(value),max(value) from fact_365_1_eutrancell_21 
> group by metric, tags_id, timeseries(ts,'thirty_minute') ,ts2
>  
> When i try to insert data into above Table, below error is thrown  :
> scala> carbon.sql("insert into fact_365_1_eutrancell_21 values ('2020-09-25 
> 05:30:00','eUtranCell.HHO.X2.InterFreq.PrepAttOut','ff6cb0f7-fba0-4134-81ee-55e820574627',392.2345,'2020-09-25
>  05:30:00')").show()
>  21/03/10 22:32:20 AUDIT audit: \{"time":"March 10, 2021 10:32:20 PM 
> IST","username":"root","opName":"INSERT 
> INTO","opId":"33474031950342736","opStatus":"START"}
>  [Stage 0:> (0 + 1) / 1]21/03/10 22:32:32 WARN CarbonOutputIteratorWrapper: 
> try to poll a row batch one more time.
>  21/03/10 22:32:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
>  21/03/10 22:32:32 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
>  21/03/10 22:32:36 WARN log: Updating partition stats fast for: 
> fact_365_1_eutrancell_21
>  21/03/10 22:32:36 WARN log: Updated size to 2699
>  21/03/10 22:32:38 AUDIT audit: \{"time":"March 10, 2021 10:32:38 PM 
> IST","username":"root","opName":"INSERT 
> OVERWRITE","opId":"33474049863830951","opStatus":"START"}
>  [Stage 3:==>(199 + 1) / 
> 200]21/03/10 22:33:07 WARN CarbonOutputIteratorWrapper: try to poll a row 
> batch one more time.
>  21/03/10 22:33:07 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
>  21/03/10 22:33:07 WARN CarbonOutputIteratorWrapper: try to poll a row batch 
> one more time.
>  21/03/10 22:33:07 ERROR CarbonFactDataHandlerColumnar: Error in producer
>  java.lang.ClassCastException: java.lang.Double cannot be cast to 
> java.lang.Long
>  at 
> org.apache.carbondata.core.datastore.page.ColumnPage.putData(ColumnPage.java:402)
>  at 
> org.apache.carbondata.processing.store.TablePage.convertToColumnarAndAddToPages(TablePage.java:239)
>  at 
> org.apache.carbondata.processing.store.TablePage.addRow(TablePage.java:201)
>  at 
> org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.processDataRows(CarbonFactDataHandlerColumnar.java:397)
>  at 
> org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.access$500(CarbonFactDataHandlerColumnar.java:60)
>  at 
> org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar$Producer.call(CarbonFactDataHandlerColumnar.java:637)
>  at 
> org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar$Producer.call(CarbonFactDataHandlerColumnar.java:614)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
>  
> It seems the method is converting "decimal" data type of table to a "long" 
> data type for MV.
> During value conversion it is throwing the error.
> Could you please check if this is a defect / bug 

[jira] [Created] (CARBONDATA-4156) Segment min max is not written considering all blocks in a segment

2021-03-22 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-4156:


 Summary: Segment min max is not written considering all blocks in 
a segment
 Key: CARBONDATA-4156
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4156
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4155) CReate table like on table with MV fails

2021-03-22 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-4155:
-
Description: 
steps to reproduce:

{color:#067d17}create table maintable(name string, c_code int, price int) 
STORED AS carbondata;{color}

{color:#067d17}create materialized view mv_table as select name, sum(price) 
from maintable group by name;{color}

{color:#067d17}create table new_Table like maintable;{color}

{color:#172b4d}Result:
{color}

2021-03-22 20:40:06 ERROR CarbonCreateTableCommand:176 - 
org.apache.spark.sql.AnalysisException: == Spark Parser: 
org.apache.spark.sql.execution.SparkSqlParser ==

extraneous input 'default' expecting \{')', ','}(line 8, pos 25)

== SQL ==
CREATE TABLE default.new_table
(`name` string,`c_code` int,`price` int)
USING carbondata
OPTIONS (
 indexexists "false",
 sort_columns "",
 comment "",
 relatedmvtablesmap "\{"default":["mv_table"]}",
-^^^
 bad_record_path "",
 local_dictionary_enable "true",
 indextableexists "false",
 tableName "new_table",
 dbName "default",
 tablePath 
"/home/root1/carbondata/integration/spark/target/warehouse/new_table",
 path 
"file:/home/root1/carbondata/integration/spark/target/warehouse/new_table",
 isExternal "false",
 isTransactional "true",
 isVisible "true"
 ,carbonSchemaPartsNo '1',carbonSchema0 
'\{"databaseName":"default","tableUniqueName":"default_new_table","factTable":{"tableId":"4ddbaea5-42b8-4ca2-b0ce-dec0af81d3b6","tableName":"new_table","listOfColumns":[{"dataType":{"id":0,"precedenceOrder":0,"name":"STRING","sizeInBytes":-1},"columnName":"name","columnUniqueId":"2293eee8-41fa-4869-8275-8c16a5dd7222","columnReferenceId":"2293eee8-41fa-4869-8275-8c16a5dd7222","isColumnar":true,"encodingList":[],"isDimensionColumn":true,"scale":-1,"precision":-1,"schemaOrdinal":0,"numberOfChild":0,"columnProperties":{},"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":true},\{"dataType":{"id":5,"precedenceOrder":3,"name":"INT","sizeInBytes":4},"columnName":"c_code","columnUniqueId":"cc3ab016-51e9-4791-8f37-8d697d972b8a","columnReferenceId":"cc3ab016-51e9-4791-8f37-8d697d972b8a","isColumnar":true,"encodingList":[],"isDimensionColumn":false,"scale":-1,"precision":-1,"schemaOrdinal":1,"numberOfChild":0,"columnProperties":{},"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":false},\{"dataType":{"id":5,"precedenceOrder":3,"name":"INT","sizeInBytes":4},"columnName":"price","columnUniqueId":"c67ed6d5-8f10-488f-a990-dfda20739907","columnReferenceId":"c67ed6d5-8f10-488f-a990-dfda20739907","isColumnar":true,"encodingList":[],"isDimensionColumn":false,"scale":-1,"precision":-1,"schemaOrdinal":2,"numberOfChild":0,"columnProperties":{},"invisible":false,"isSortColumn":false,"aggFunction":"","timeSeriesFunction":"","isLocalDictColumn":false}],"schemaEvolution":\{"schemaEvolutionEntryList":[{"timeStamp":1616425806915}]},"tableProperties":\{"indexexists":"false","sort_columns":"","comment":"","relatedmvtablesmap":"{\"default\":[\"mv_table\"]}","bad_record_path":"","local_dictionary_enable":"true","indextableexists":"false"}},"lastUpdatedTime":1616425806915,"tablePath":"file:/home/root1/carbondata/integration/spark/target/warehouse/new_table","isTransactionalTable":true,"hasColumnDrift":false,"isSchemaModified":false}')

> CReate table like on table with MV fails 
> -
>
> Key: CARBONDATA-4155
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4155
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Minor
>
> steps to reproduce:
> {color:#067d17}create table maintable(name string, c_code int, price int) 
> STORED AS carbondata;{color}
> {color:#067d17}create materialized view mv_table as select name, sum(price) 
> from maintable group by name;{color}
> {color:#067d17}create table new_Table like maintable;{color}
> {color:#172b4d}Result:
> {color}
> 2021-03-22 20:40:06 ERROR CarbonCreateTableCommand:176 - 
> org.apache.spark.sql.AnalysisException: == Spark Parser: 
> org.apache.spark.sql.execution.SparkSqlParser ==
> extraneous input 'default' expecting \{')', ','}(line 8, pos 25)
> == SQL ==
> CREATE TABLE default.new_table
> (`name` string,`c_code` int,`price` int)
> USING carbondata
> OPTIONS (
>  indexexists "false",
>  sort_columns "",
>  comment "",
>  relatedmvtablesmap "\{"default":["mv_table"]}",
> -^^^
>  bad_record_path "",
>  local_dictionary_enable "true",
>  indextableexists "false",
>  tableName "new_table",
>  dbName "default",
>  tablePath 
> "/home/root1/carbondata/integration/spark/target/warehouse/new_table",
>  path 
> "file:/home/root1/carbondata/integration/spark/target/warehouse/new_table",
>  

[jira] [Created] (CARBONDATA-4155) CReate table like on table with MV fails

2021-03-22 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-4155:


 Summary: CReate table like on table with MV fails 
 Key: CARBONDATA-4155
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4155
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4153) DoNot Push down 'not equal to' filter with Cast on SI

2021-03-17 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-4153:
-
Description: 
For NOT EQUAL TO filter on SI index column, should not be pushed down to SI 
table.

Currently, where x!='2' is not pushing down to SI, but where x!=2 is pushed 
down to SI.

> DoNot Push down 'not equal to' filter with Cast on SI
> -
>
> Key: CARBONDATA-4153
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4153
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Minor
>
> For NOT EQUAL TO filter on SI index column, should not be pushed down to SI 
> table.
> Currently, where x!='2' is not pushing down to SI, but where x!=2 is pushed 
> down to SI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4153) DoNot Push down 'not equal to' filter with Cast on SI

2021-03-17 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-4153:


 Summary: DoNot Push down 'not equal to' filter with Cast on SI
 Key: CARBONDATA-4153
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4153
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-2091) Enhance data loading performance by specifying range bounds for sort columns

2020-11-22 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-2091:
-
Fix Version/s: (was: 2.0.2)
   2.0.1

> Enhance data loading performance by specifying range bounds for sort columns
> 
>
> Key: CARBONDATA-2091
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2091
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Chuanyin Xu
>Assignee: Chuanyin Xu
>Priority: Major
> Fix For: 2.0.1
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> Currently in carbondata, data loading using node_sort (also known as 
> local_sort) has the following procedures:
>  # convert the input data in batch. (*Convert*)
>  # sort the batch and write to the sort temp files. (*TempSort*)
>  # combine the sort temp files and do merge sort to get a bigger ordered sort 
> temp file. (*MergeSort*)
>  # combine all the sort temp files and do a final sort, its results will feed 
> the next procedure. (*FinalSort*)
>  # get rows in order and convert rows to carbondata columnar format pages. 
> (*produce*)
>  # Write bundles of pages to files and write the corresponding index file. 
> (*consume*)
> The Step1~Step3 are done concurrently using multi-thread. The Step4 is done 
> using only one thread. The Step5 is done using multi-thread. So the Step4 is 
> the bottleneck among all the procedures. When observing the data loading 
> performance, we can see that the CPU usage after Step3 is low.
>  
> We can enhance the data loading performance by parallelizing Step4.
>  
> User can specify range bounds for the sort columns and carbondata internally 
> distributes the records to different ranges and process the data concurrently 
> in different ranges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4043) Fix data load failure issue for columns added in legacy store

2020-10-23 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-4043:
-
Description: 
h3. When dimension is added in older versions like 1.1, by default it will be 
sort column. In sort step we assume data will be coming as sort column in the 
beginning. But the added column will be at last eventhough sort column. So, 
while building the dataload configurations for loading data, we rearrange the 
columns(dimensions and datafields) in order to bring the sort column to 
beginning and no-sort to last and revert them back to schema order before 
FinalMerge/DataWriter step.

Issue:
 Data loading is failing because of castException in data writing step in case 
of NO_SORT and in final sort step in case of LOCAL_SORT.

> Fix data load failure issue for columns added in legacy store
> -
>
> Key: CARBONDATA-4043
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4043
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>
> h3. When dimension is added in older versions like 1.1, by default it will be 
> sort column. In sort step we assume data will be coming as sort column in the 
> beginning. But the added column will be at last eventhough sort column. So, 
> while building the dataload configurations for loading data, we rearrange the 
> columns(dimensions and datafields) in order to bring the sort column to 
> beginning and no-sort to last and revert them back to schema order before 
> FinalMerge/DataWriter step.
> Issue:
>  Data loading is failing because of castException in data writing step in 
> case of NO_SORT and in final sort step in case of LOCAL_SORT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4043) Fix data load failure issue for columns added in legacy store

2020-10-23 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-4043:


 Summary: Fix data load failure issue for columns added in legacy 
store
 Key: CARBONDATA-4043
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4043
 Project: CarbonData
  Issue Type: Bug
Affects Versions: 2.1.0
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-2868) Create Table DDL support for Map DataType

2020-10-06 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh resolved CARBONDATA-2868.
--
Resolution: Fixed

> Create Table DDL support for Map DataType
> -
>
> Key: CARBONDATA-2868
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2868
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-2867) Load DDL support for Map DataType

2020-10-06 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh resolved CARBONDATA-2867.
--
Resolution: Fixed

> Load DDL support for Map DataType
> -
>
> Key: CARBONDATA-2867
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2867
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3739) Select with order by columns not in projection gives wrong results

2020-10-06 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh resolved CARBONDATA-3739.
--
Resolution: Fixed

> Select with order by columns not in projection gives wrong results
> --
>
> Key: CARBONDATA-3739
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3739
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3594) Optimize getSplits() during compaction

2020-10-06 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh resolved CARBONDATA-3594.
--
Resolution: Fixed

> Optimize getSplits() during compaction
> --
>
> Key: CARBONDATA-3594
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3594
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3750) Support segment level MInMax for Secondary Index

2020-10-06 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh resolved CARBONDATA-3750.
--
Resolution: Fixed

> Support segment level MInMax for Secondary Index
> 
>
> Key: CARBONDATA-3750
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3750
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3890) Fix MV case sensitive issues with ImplicitCastInputTypes and Add Doc for Show MV

2020-10-06 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh resolved CARBONDATA-3890.
--
Resolution: Fixed

> Fix MV case sensitive issues with ImplicitCastInputTypes and Add Doc for Show 
> MV
> 
>
> Key: CARBONDATA-3890
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3890
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3931) Secondary index with index column as DateType gives wrong results

2020-10-06 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh resolved CARBONDATA-3931.
--
Resolution: Fixed

> Secondary index with index column as DateType gives wrong results
> -
>
> Key: CARBONDATA-3931
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3931
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (CARBONDATA-3146) Support Load data using Json for CarbonSession

2020-10-06 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh closed CARBONDATA-3146.

Resolution: Later

> Support Load data using Json for CarbonSession
> --
>
> Key: CARBONDATA-3146
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3146
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Indhumathi Muthumurugesh
>Assignee: Indhumathi Muthumurugesh
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4009) PartialQuery not hitting mv

2020-09-25 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-4009:
-
Summary: PartialQuery not hitting mv  (was: SubQuery not hitting mv)

> PartialQuery not hitting mv
> ---
>
> Key: CARBONDATA-4009
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4009
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4009) SubQuery not hitting mv

2020-09-24 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-4009:


 Summary: SubQuery not hitting mv
 Key: CARBONDATA-4009
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4009
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3990) Fix DropCache log error when indexmap is null

2020-09-15 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3990:


 Summary: Fix DropCache log error  when indexmap is null
 Key: CARBONDATA-3990
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3990
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3982) Use Partition instead of Span to split legacy and non-legacy segments for executor distribution in indexserver

2020-09-10 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3982:


 Summary: Use Partition instead of Span to split legacy and 
non-legacy segments for executor distribution in indexserver 
 Key: CARBONDATA-3982
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3982
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3969) Fix Deserialization issue with DataType class

2020-09-01 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3969:


 Summary: Fix Deserialization issue with DataType class
 Key: CARBONDATA-3969
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3969
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3946) Support IndexServer with Presto Engine

2020-08-07 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3946:


 Summary: Support IndexServer with Presto Engine
 Key: CARBONDATA-3946
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3946
 Project: CarbonData
  Issue Type: New Feature
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3931) Secondary index with index column as DateType gives wrong results

2020-07-29 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3931:


 Summary: Secondary index with index column as DateType gives wrong 
results
 Key: CARBONDATA-3931
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3931
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3916) Support array with SI

2020-07-20 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3916:


 Summary: Support array with SI
 Key: CARBONDATA-3916
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3916
 Project: CarbonData
  Issue Type: New Feature
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3902) Query on partition table gives incorrect results after Delete records using CDC

2020-07-15 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3902:
-
Description: 
Steps to Reproduce Issue :
{code:java}
import scala.collection.JavaConverters.
import java.sql.Date
import org.apache.spark.sql._
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.catalyst.TableIdentifier
import 
org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.test.util.QueryTest
import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, 
StringType, StructField, StructType}
import spark.implicits.


sql("drop table if exists target").show()

val initframe = spark.createDataFrame(Seq(
Row("a", "0"),
Row("b", "1"),
Row("c", "2"),
Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringType

initframe.write
.format("carbondata")
.option("tableName", "target")
.option("partitionColumns", "value")
.mode(SaveMode.Overwrite)
.save()
val target = spark.read.format("carbondata").option("tableName", 
"target").load()

var ccd =
spark.createDataFrame(Seq(
Row("a", "10", false, 0),
Row("a", null, true, 1),
Row("b", null, true, 2),
Row("c", null, true, 3),
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6)
).asJava,
StructType(Seq(StructField("key", StringType),
StructField("newValue", StringType),
StructField("deleted", BooleanType), StructField("time", IntegerType

ccd.createOrReplaceTempView("changes")

ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes 
GROUP BY key)")

val updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]
val insertMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

target.as("A").merge(ccd.as("B"), "A.key=B.key").
whenMatched("B.deleted=true").
delete().execute(){code}
 

 After this delete operation, partition 0, 1 and 2 should have deleted from it.

Actual:

{color:#067d17}select * from target order by key;{color}

{color:#067d17}+---+-+
|key|value|
+---+-+
|a |0 |
|b |1 |
|c |2 |
|d |3 |
+---+-+{color}

{color:#067d17}Expected:{color}

{color:#067d17}+---+-+
|key|value|
+---+-+
|d |3 |
+---+-+{color}

  was:
Steps to Reproduce Issue :
{code:java}
import scala.collection.JavaConverters.
import java.sql.Date
import org.apache.spark.sql._
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.catalyst.TableIdentifier
import 
org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.test.util.QueryTest
import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, 
StringType, StructField, StructType}
import spark.implicits.


sql("drop table if exists target").show()

val initframe = spark.createDataFrame(Seq(
Row("a", "0"),
Row("b", "1"),
Row("c", "2"),
Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringType

initframe.write
.format("carbondata")
.option("tableName", "target")
.option("partitionColumns", "value")
.mode(SaveMode.Overwrite)
.save()
val target = spark.read.format("carbondata").option("tableName", 
"target").load()

var ccd =
spark.createDataFrame(Seq(
Row("a", "10", false, 0),
Row("a", null, true, 1),
Row("b", null, true, 2),
Row("c", null, true, 3),
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6)
).asJava,
StructType(Seq(StructField("key", StringType),
StructField("newValue", StringType),
StructField("deleted", BooleanType), StructField("time", IntegerType

ccd.createOrReplaceTempView("changes")

ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes 
GROUP BY key)")

val updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]
val insertMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

target.as("A").merge(ccd.as("B"), "A.key=B.key").
whenMatched("B.deleted=true").
delete().execute(){code}
 

 

abc


> Query on partition table gives incorrect results after Delete records using 
> CDC
> ---
>
> Key: CARBONDATA-3902
> 

[jira] [Updated] (CARBONDATA-3902) Query on partition table gives incorrect results after Delete records using CDC

2020-07-15 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3902:
-
Description: 
Steps to Reproduce Issue :
{code:java}
import scala.collection.JavaConverters.
import java.sql.Date
import org.apache.spark.sql._
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.catalyst.TableIdentifier
import 
org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.test.util.QueryTest
import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, 
StringType, StructField, StructType}
import spark.implicits.


sql("drop table if exists target").show()

val initframe = spark.createDataFrame(Seq(
Row("a", "0"),
Row("b", "1"),
Row("c", "2"),
Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringType

initframe.write
.format("carbondata")
.option("tableName", "target")
.option("partitionColumns", "value")
.mode(SaveMode.Overwrite)
.save()
val target = spark.read.format("carbondata").option("tableName", 
"target").load()

var ccd =
spark.createDataFrame(Seq(
Row("a", "10", false, 0),
Row("a", null, true, 1),
Row("b", null, true, 2),
Row("c", null, true, 3),
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6)
).asJava,
StructType(Seq(StructField("key", StringType),
StructField("newValue", StringType),
StructField("deleted", BooleanType), StructField("time", IntegerType

ccd.createOrReplaceTempView("changes")

ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes 
GROUP BY key)")

val updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]
val insertMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

target.as("A").merge(ccd.as("B"), "A.key=B.key").
whenMatched("B.deleted=true").
delete().execute(){code}
 

 

abc

  was:
Steps to Reproduce Issue :
{code:java}
import scala.collection.JavaConverters.
import java.sql.Date
import org.apache.spark.sql._
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.catalyst.TableIdentifier
import 
org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.test.util.QueryTest
import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, 
StringType, StructField, StructType}
import spark.implicits.


sql("drop table if exists target").show()

val initframe = spark.createDataFrame(Seq(
Row("a", "0"),
Row("b", "1"),
Row("c", "2"),
Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringTypeinitframe.write
.format("carbondata")
.option("tableName", "target")
.option("partitionColumns", "value")
.mode(SaveMode.Overwrite)
.save()
val target = spark.read.format("carbondata").option("tableName", 
"target").load()

var ccd =
spark.createDataFrame(Seq(
Row("a", "10", false, 0),
Row("a", null, true, 1),
Row("b", null, true, 2),
Row("c", null, true, 3),
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6)
).asJava,
StructType(Seq(StructField("key", StringType),
StructField("newValue", StringType),
StructField("deleted", BooleanType), StructField("time", IntegerType

ccd.createOrReplaceTempView("changes")

ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes 
GROUP BY key)")

val updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]
val insertMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

target.as("A").merge(ccd.as("B"), "A.key=B.key").
whenMatched("B.deleted=true").
delete().execute(){code}
 

 

abc


> Query on partition table gives incorrect results after Delete records using 
> CDC
> ---
>
> Key: CARBONDATA-3902
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3902
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>
> Steps to Reproduce Issue :
> {code:java}
> import scala.collection.JavaConverters.
> import java.sql.Date
> import org.apache.spark.sql._
> import 

[jira] [Updated] (CARBONDATA-3902) Query on partition table gives incorrect results after Delete records using CDC

2020-07-15 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3902:
-
Description: 
Steps to Reproduce Issue :
{code:java}
import scala.collection.JavaConverters.
import java.sql.Date
import org.apache.spark.sql._
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.catalyst.TableIdentifier
import 
org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.test.util.QueryTest
import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, 
StringType, StructField, StructType}
import spark.implicits.


sql("drop table if exists target").show()

val initframe = spark.createDataFrame(Seq(
Row("a", "0"),
Row("b", "1"),
Row("c", "2"),
Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringTypeinitframe.write
.format("carbondata")
.option("tableName", "target")
.option("partitionColumns", "value")
.mode(SaveMode.Overwrite)
.save()
val target = spark.read.format("carbondata").option("tableName", 
"target").load()

var ccd =
spark.createDataFrame(Seq(
Row("a", "10", false, 0),
Row("a", null, true, 1),
Row("b", null, true, 2),
Row("c", null, true, 3),
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6)
).asJava,
StructType(Seq(StructField("key", StringType),
StructField("newValue", StringType),
StructField("deleted", BooleanType), StructField("time", IntegerType

ccd.createOrReplaceTempView("changes")

ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes 
GROUP BY key)")

val updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]
val insertMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

target.as("A").merge(ccd.as("B"), "A.key=B.key").
whenMatched("B.deleted=true").
delete().execute(){code}
 

 

abc

  was:
Steps to Reproduce Issue :
{code:java}
import scala.collection.JavaConverters.
import java.sql.Date
import org.apache.spark.sql._
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.catalyst.TableIdentifier
import 
org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.test.util.QueryTest
import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, 
StringType, StructField, StructType}
import spark.implicits.


sql("drop table if exists target").show()

val initframe = spark.createDataFrame(Seq(
Row("a", "0"),
Row("b", "1"),
Row("c", "2"),
Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringTypeinitframe.write
.format("carbondata")
.option("tableName", "target")
.option("partitionColumns", "value")
.mode(SaveMode.Overwrite)
.save()
val target = spark.read.format("carbondata").option("tableName", 
"target").load()

var ccd =
spark.createDataFrame(Seq(
Row("a", "10", false, 0),
Row("a", null, true, 1),
Row("b", null, true, 2),
Row("c", null, true, 3),
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6)
).asJava,
StructType(Seq(StructField("key", StringType),
StructField("newValue", StringType),
StructField("deleted", BooleanType), StructField("time", IntegerType

ccd.createOrReplaceTempView("changes")

ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes 
GROUP BY key)")

val updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]
val insertMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

target.as("A").merge(ccd.as("B"), "A.key=B.key").
whenMatched("B.deleted=true").
delete().execute()
{code}


> Query on partition table gives incorrect results after Delete records using 
> CDC
> ---
>
> Key: CARBONDATA-3902
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3902
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>
> Steps to Reproduce Issue :
> {code:java}
> import scala.collection.JavaConverters.
> import java.sql.Date
> import org.apache.spark.sql._
> import 

[jira] [Updated] (CARBONDATA-3902) Query on partition table gives incorrect results after Delete records using CDC

2020-07-15 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3902:
-
Attachment: (was: issue.scala)

> Query on partition table gives incorrect results after Delete records using 
> CDC
> ---
>
> Key: CARBONDATA-3902
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3902
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>
> Steps to Reproduce Issue :
> [^issue.scala]
> {code:java}
> // code placeholder
> import scala.collection.JavaConverters.
> import java.sql.Date
> import org.apache.spark.sql._
> import org.apache.spark.sql.CarbonSession._
> import org.apache.spark.sql.catalyst.TableIdentifier
> import 
> org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
>  DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
> MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
> WhenNotMatchedAndExistsOnlyOnTarget}
> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.test.util.QueryTest
> import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, 
> StringType, StructField, StructType}
> import spark.implicits.
> sql("drop table if exists target").show()
> val initframe = spark.createDataFrame(Seq(
> Row("a", "0"),
> Row("b", "1"),
> Row("c", "2"),
> Row("d", "3")
> ).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
> StringTypeinitframe.write
> .format("carbondata")
> .option("tableName", "target")
> .option("partitionColumns", "value")
> .mode(SaveMode.Overwrite)
> .save()
> val target = spark.read.format("carbondata").option("tableName", 
> "target").load()
> var ccd =
> spark.createDataFrame(Seq(
> Row("a", "10", false, 0),
> Row("a", null, true, 1),
> Row("b", null, true, 2),
> Row("c", null, true, 3),
> Row("c", "20", false, 4),
> Row("c", "200", false, 5),
> Row("e", "100", false, 6)
> ).asJava,
> StructType(Seq(StructField("key", StringType),
> StructField("newValue", StringType),
> StructField("deleted", BooleanType), StructField("time", IntegerType
> ccd.createOrReplaceTempView("changes")
> ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
> FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM 
> changes GROUP BY key)")
> val updateMap = Map("key" -> "B.key", "value" -> 
> "B.newValue").asInstanceOf[Map[Any, Any]]
> val insertMap = Map("key" -> "B.key", "value" -> 
> "B.newValue").asInstanceOf[Map[Any, Any]]
> target.as("A").merge(ccd.as("B"), "A.key=B.key").
> whenMatched("B.deleted=true").
> delete().execute()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3902) Query on partition table gives incorrect results after Delete records using CDC

2020-07-15 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3902:
-
Description: 
Steps to Reproduce Issue :
{code:java}
import scala.collection.JavaConverters.
import java.sql.Date
import org.apache.spark.sql._
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.catalyst.TableIdentifier
import 
org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.test.util.QueryTest
import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, 
StringType, StructField, StructType}
import spark.implicits.


sql("drop table if exists target").show()

val initframe = spark.createDataFrame(Seq(
Row("a", "0"),
Row("b", "1"),
Row("c", "2"),
Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringTypeinitframe.write
.format("carbondata")
.option("tableName", "target")
.option("partitionColumns", "value")
.mode(SaveMode.Overwrite)
.save()
val target = spark.read.format("carbondata").option("tableName", 
"target").load()

var ccd =
spark.createDataFrame(Seq(
Row("a", "10", false, 0),
Row("a", null, true, 1),
Row("b", null, true, 2),
Row("c", null, true, 3),
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6)
).asJava,
StructType(Seq(StructField("key", StringType),
StructField("newValue", StringType),
StructField("deleted", BooleanType), StructField("time", IntegerType

ccd.createOrReplaceTempView("changes")

ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes 
GROUP BY key)")

val updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]
val insertMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

target.as("A").merge(ccd.as("B"), "A.key=B.key").
whenMatched("B.deleted=true").
delete().execute()
{code}

  was:
Steps to Reproduce Issue :

[^issue.scala]
{code:java}
// code placeholder
import scala.collection.JavaConverters.
import java.sql.Date
import org.apache.spark.sql._
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.catalyst.TableIdentifier
import 
org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.test.util.QueryTest
import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, 
StringType, StructField, StructType}
import spark.implicits.


sql("drop table if exists target").show()

val initframe = spark.createDataFrame(Seq(
Row("a", "0"),
Row("b", "1"),
Row("c", "2"),
Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringTypeinitframe.write
.format("carbondata")
.option("tableName", "target")
.option("partitionColumns", "value")
.mode(SaveMode.Overwrite)
.save()
val target = spark.read.format("carbondata").option("tableName", 
"target").load()

var ccd =
spark.createDataFrame(Seq(
Row("a", "10", false, 0),
Row("a", null, true, 1),
Row("b", null, true, 2),
Row("c", null, true, 3),
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6)
).asJava,
StructType(Seq(StructField("key", StringType),
StructField("newValue", StringType),
StructField("deleted", BooleanType), StructField("time", IntegerType

ccd.createOrReplaceTempView("changes")

ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes 
GROUP BY key)")

val updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]
val insertMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

target.as("A").merge(ccd.as("B"), "A.key=B.key").
whenMatched("B.deleted=true").
delete().execute()
{code}


> Query on partition table gives incorrect results after Delete records using 
> CDC
> ---
>
> Key: CARBONDATA-3902
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3902
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>
> Steps to Reproduce Issue :
> {code:java}
> import scala.collection.JavaConverters.
> import java.sql.Date
> import org.apache.spark.sql._
> 

[jira] [Updated] (CARBONDATA-3902) Query on partition table gives incorrect results after Delete records using CDC

2020-07-15 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3902:
-
Description: 
Steps to Reproduce Issue :

[^issue.scala]
{code:java}
// code placeholder
import scala.collection.JavaConverters.
import java.sql.Date
import org.apache.spark.sql._
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.catalyst.TableIdentifier
import 
org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.test.util.QueryTest
import org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, 
StringType, StructField, StructType}
import spark.implicits.


sql("drop table if exists target").show()

val initframe = spark.createDataFrame(Seq(
Row("a", "0"),
Row("b", "1"),
Row("c", "2"),
Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringTypeinitframe.write
.format("carbondata")
.option("tableName", "target")
.option("partitionColumns", "value")
.mode(SaveMode.Overwrite)
.save()
val target = spark.read.format("carbondata").option("tableName", 
"target").load()

var ccd =
spark.createDataFrame(Seq(
Row("a", "10", false, 0),
Row("a", null, true, 1),
Row("b", null, true, 2),
Row("c", null, true, 3),
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6)
).asJava,
StructType(Seq(StructField("key", StringType),
StructField("newValue", StringType),
StructField("deleted", BooleanType), StructField("time", IntegerType

ccd.createOrReplaceTempView("changes")

ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes 
GROUP BY key)")

val updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]
val insertMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

target.as("A").merge(ccd.as("B"), "A.key=B.key").
whenMatched("B.deleted=true").
delete().execute()
{code}

  was:
Steps to Reproduce Issue :

[^issue.scala]
{code:java}
// code placeholder
import scala.collection.JavaConverters.
import java.sql.Date
import org.apache.spark.sql._
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.catalyst.TableIdentifier
import 
org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}import 
org.apache.spark.sql.functions._import 
org.apache.spark.sql.test.util.QueryTestimport 
org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, StringType, 
StructField, StructType}
import spark.implicits.sql("drop table if exists target").show()val initframe = 
spark.createDataFrame(Seq(
Row("a", "0"),
Row("b", "1"),
Row("c", "2"),
Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringTypeinitframe.write
.format("carbondata")
.option("tableName", "target")
.option("partitionColumns", "value")
.mode(SaveMode.Overwrite)
.save()val target = spark.read.format("carbondata").option("tableName", 
"target").load()var ccd =
spark.createDataFrame(Seq(
Row("a", "10", false, 0),
Row("a", null, true, 1),
Row("b", null, true, 2),
Row("c", null, true, 3),
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6)
).asJava,
StructType(Seq(StructField("key", StringType),
StructField("newValue", StringType),
StructField("deleted", BooleanType), StructField("time", 
IntegerTypeccd.createOrReplaceTempView("changes")ccd = sql("SELECT key, 
latest.newValue as newValue, latest.deleted as deleted FROM ( SELECT key, 
max(struct(time, newValue, deleted)) as latest FROM changes GROUP BY key)")val 
updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]val insertMap = Map("key" -> "B.key", 
"value" -> "B.newValue").asInstanceOf[Map[Any, 
Any]]target.as("A").merge(ccd.as("B"), "A.key=B.key").
whenMatched("B.deleted=true").
delete().execute()
{code}


> Query on partition table gives incorrect results after Delete records using 
> CDC
> ---
>
> Key: CARBONDATA-3902
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3902
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
> Attachments: issue.scala
>
>
> Steps to Reproduce Issue :
> [^issue.scala]
> {code:java}
> // code placeholder

[jira] [Updated] (CARBONDATA-3902) Query on partition table gives incorrect results after Delete records using CDC

2020-07-15 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3902:
-
Attachment: issue.scala

> Query on partition table gives incorrect results after Delete records using 
> CDC
> ---
>
> Key: CARBONDATA-3902
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3902
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
> Attachments: issue.scala
>
>
> Steps to Reproduce Issue :
> import scala.collection.JavaConverters._import java.sql.Dateimport 
> org.apache.spark.sql._import org.apache.spark.sql.CarbonSession._import 
> org.apache.spark.sql.catalyst.TableIdentifierimport 
> org.apache.spark.sql.execution.command.mutation.merge.\{CarbonMergeDataSetCommand,
>  DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
> MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
> WhenNotMatchedAndExistsOnlyOnTarget}import 
> org.apache.spark.sql.functions._import 
> org.apache.spark.sql.test.util.QueryTestimport 
> org.apache.spark.sql.types.\{BooleanType, DateType, IntegerType, StringType, 
> StructField, StructType}import spark.implicits._
>   
> sql("drop table if exists target").show()
> val initframe = spark.createDataFrame(Seq(
>   Row("a", "0"),
>   Row("b", "1"),
>   Row("c", "2"),
>   Row("d", "3")
> ).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
> StringType
> initframe.write
>   .format("carbondata")
>   .option("tableName", "target")
>   .option("partitionColumns", "value")
>   .mode(SaveMode.Overwrite)
>   .save()
>   
> val target = spark.read.format("carbondata").option("tableName", 
> "target").load()var ccd =
>   spark.createDataFrame(Seq(
> Row("a", "10", false,  0),
> Row("a", null, true, 1),   
> Row("b", null, true, 2),   
> Row("c", null, true, 3),   
> Row("c", "20", false, 4),
> Row("c", "200", false, 5),
> Row("e", "100", false, 6) 
>   ).asJava,
> StructType(Seq(StructField("key", StringType),
>   StructField("newValue", StringType),
>   StructField("deleted", BooleanType), StructField("time", IntegerType
> 
> ccd.createOrReplaceTempView("changes")
> ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
> FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM 
> changes GROUP BY key)")
> val updateMap = Map("key" -> "B.key", "value" -> 
> "B.newValue").asInstanceOf[Map[Any, Any]]
> val insertMap = Map("key" -> "B.key", "value" -> 
> "B.newValue").asInstanceOf[Map[Any, Any]]
> target.as("A").merge(ccd.as("B"), "A.key=B.key").
>   whenMatched("B.deleted=true").
>   delete().execute()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3902) Query on partition table gives incorrect results after Delete records using CDC

2020-07-15 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3902:
-
Description: 
Steps to Reproduce Issue :

[^issue.scala]
{code:java}
// code placeholder
import scala.collection.JavaConverters.
import java.sql.Date
import org.apache.spark.sql._
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.catalyst.TableIdentifier
import 
org.apache.spark.sql.execution.command.mutation.merge.{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}import 
org.apache.spark.sql.functions._import 
org.apache.spark.sql.test.util.QueryTestimport 
org.apache.spark.sql.types.{BooleanType, DateType, IntegerType, StringType, 
StructField, StructType}
import spark.implicits.sql("drop table if exists target").show()val initframe = 
spark.createDataFrame(Seq(
Row("a", "0"),
Row("b", "1"),
Row("c", "2"),
Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringTypeinitframe.write
.format("carbondata")
.option("tableName", "target")
.option("partitionColumns", "value")
.mode(SaveMode.Overwrite)
.save()val target = spark.read.format("carbondata").option("tableName", 
"target").load()var ccd =
spark.createDataFrame(Seq(
Row("a", "10", false, 0),
Row("a", null, true, 1),
Row("b", null, true, 2),
Row("c", null, true, 3),
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6)
).asJava,
StructType(Seq(StructField("key", StringType),
StructField("newValue", StringType),
StructField("deleted", BooleanType), StructField("time", 
IntegerTypeccd.createOrReplaceTempView("changes")ccd = sql("SELECT key, 
latest.newValue as newValue, latest.deleted as deleted FROM ( SELECT key, 
max(struct(time, newValue, deleted)) as latest FROM changes GROUP BY key)")val 
updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]val insertMap = Map("key" -> "B.key", 
"value" -> "B.newValue").asInstanceOf[Map[Any, 
Any]]target.as("A").merge(ccd.as("B"), "A.key=B.key").
whenMatched("B.deleted=true").
delete().execute()
{code}

  was:
Steps to Reproduce Issue :
import scala.collection.JavaConverters._import java.sql.Dateimport 
org.apache.spark.sql._import org.apache.spark.sql.CarbonSession._import 
org.apache.spark.sql.catalyst.TableIdentifierimport 
org.apache.spark.sql.execution.command.mutation.merge.\{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}import 
org.apache.spark.sql.functions._import 
org.apache.spark.sql.test.util.QueryTestimport 
org.apache.spark.sql.types.\{BooleanType, DateType, IntegerType, StringType, 
StructField, StructType}import spark.implicits._

sql("drop table if exists target").show()

val initframe = spark.createDataFrame(Seq(
  Row("a", "0"),
  Row("b", "1"),
  Row("c", "2"),
  Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringType

initframe.write
  .format("carbondata")
  .option("tableName", "target")
  .option("partitionColumns", "value")
  .mode(SaveMode.Overwrite)
  .save()
  
val target = spark.read.format("carbondata").option("tableName", 
"target").load()var ccd =
  spark.createDataFrame(Seq(
Row("a", "10", false,  0),
Row("a", null, true, 1),   
Row("b", null, true, 2),   
Row("c", null, true, 3),   
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6) 
  ).asJava,
StructType(Seq(StructField("key", StringType),
  StructField("newValue", StringType),
  StructField("deleted", BooleanType), StructField("time", IntegerType
  
ccd.createOrReplaceTempView("changes")

ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes 
GROUP BY key)")

val updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

val insertMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

target.as("A").merge(ccd.as("B"), "A.key=B.key").
  whenMatched("B.deleted=true").
  delete().execute()


> Query on partition table gives incorrect results after Delete records using 
> CDC
> ---
>
> Key: CARBONDATA-3902
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3902
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
> Attachments: issue.scala
>
>
> Steps to Reproduce Issue :
> 

[jira] [Updated] (CARBONDATA-3902) Query on partition table gives incorrect results after Delete records using CDC

2020-07-15 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3902:
-
Description: 
Steps to Reproduce Issue :
import scala.collection.JavaConverters._import java.sql.Dateimport 
org.apache.spark.sql._import org.apache.spark.sql.CarbonSession._import 
org.apache.spark.sql.catalyst.TableIdentifierimport 
org.apache.spark.sql.execution.command.mutation.merge.\{CarbonMergeDataSetCommand,
 DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
WhenNotMatchedAndExistsOnlyOnTarget}import 
org.apache.spark.sql.functions._import 
org.apache.spark.sql.test.util.QueryTestimport 
org.apache.spark.sql.types.\{BooleanType, DateType, IntegerType, StringType, 
StructField, StructType}import spark.implicits._

sql("drop table if exists target").show()

val initframe = spark.createDataFrame(Seq(
  Row("a", "0"),
  Row("b", "1"),
  Row("c", "2"),
  Row("d", "3")
).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
StringType

initframe.write
  .format("carbondata")
  .option("tableName", "target")
  .option("partitionColumns", "value")
  .mode(SaveMode.Overwrite)
  .save()
  
val target = spark.read.format("carbondata").option("tableName", 
"target").load()var ccd =
  spark.createDataFrame(Seq(
Row("a", "10", false,  0),
Row("a", null, true, 1),   
Row("b", null, true, 2),   
Row("c", null, true, 3),   
Row("c", "20", false, 4),
Row("c", "200", false, 5),
Row("e", "100", false, 6) 
  ).asJava,
StructType(Seq(StructField("key", StringType),
  StructField("newValue", StringType),
  StructField("deleted", BooleanType), StructField("time", IntegerType
  
ccd.createOrReplaceTempView("changes")

ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM changes 
GROUP BY key)")

val updateMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

val insertMap = Map("key" -> "B.key", "value" -> 
"B.newValue").asInstanceOf[Map[Any, Any]]

target.as("A").merge(ccd.as("B"), "A.key=B.key").
  whenMatched("B.deleted=true").
  delete().execute()

> Query on partition table gives incorrect results after Delete records using 
> CDC
> ---
>
> Key: CARBONDATA-3902
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3902
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>
> Steps to Reproduce Issue :
> import scala.collection.JavaConverters._import java.sql.Dateimport 
> org.apache.spark.sql._import org.apache.spark.sql.CarbonSession._import 
> org.apache.spark.sql.catalyst.TableIdentifierimport 
> org.apache.spark.sql.execution.command.mutation.merge.\{CarbonMergeDataSetCommand,
>  DeleteAction, InsertAction, InsertInHistoryTableAction, MergeDataSetMatches, 
> MergeMatch, UpdateAction, WhenMatched, WhenNotMatched, 
> WhenNotMatchedAndExistsOnlyOnTarget}import 
> org.apache.spark.sql.functions._import 
> org.apache.spark.sql.test.util.QueryTestimport 
> org.apache.spark.sql.types.\{BooleanType, DateType, IntegerType, StringType, 
> StructField, StructType}import spark.implicits._
>   
> sql("drop table if exists target").show()
> val initframe = spark.createDataFrame(Seq(
>   Row("a", "0"),
>   Row("b", "1"),
>   Row("c", "2"),
>   Row("d", "3")
> ).asJava, StructType(Seq(StructField("key", StringType), StructField("value", 
> StringType
> initframe.write
>   .format("carbondata")
>   .option("tableName", "target")
>   .option("partitionColumns", "value")
>   .mode(SaveMode.Overwrite)
>   .save()
>   
> val target = spark.read.format("carbondata").option("tableName", 
> "target").load()var ccd =
>   spark.createDataFrame(Seq(
> Row("a", "10", false,  0),
> Row("a", null, true, 1),   
> Row("b", null, true, 2),   
> Row("c", null, true, 3),   
> Row("c", "20", false, 4),
> Row("c", "200", false, 5),
> Row("e", "100", false, 6) 
>   ).asJava,
> StructType(Seq(StructField("key", StringType),
>   StructField("newValue", StringType),
>   StructField("deleted", BooleanType), StructField("time", IntegerType
> 
> ccd.createOrReplaceTempView("changes")
> ccd = sql("SELECT key, latest.newValue as newValue, latest.deleted as deleted 
> FROM ( SELECT key, max(struct(time, newValue, deleted)) as latest FROM 
> changes GROUP BY key)")
> val updateMap = Map("key" -> "B.key", "value" -> 
> "B.newValue").asInstanceOf[Map[Any, Any]]
> val insertMap = Map("key" -> "B.key", "value" -> 
> "B.newValue").asInstanceOf[Map[Any, Any]]
> 

[jira] [Created] (CARBONDATA-3902) Query on partition table gives incorrect results after Delete records using CDC

2020-07-15 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3902:


 Summary: Query on partition table gives incorrect results after 
Delete records using CDC
 Key: CARBONDATA-3902
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3902
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3890) Fix MV case sensitive issues with ImplicitCastInputTypes and Add Doc for Show MV

2020-07-05 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3890:


 Summary: Fix MV case sensitive issues with ImplicitCastInputTypes 
and Add Doc for Show MV
 Key: CARBONDATA-3890
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3890
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3864) Store Size Optimization

2020-06-22 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3864:


 Summary: Store Size Optimization
 Key: CARBONDATA-3864
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3864
 Project: CarbonData
  Issue Type: Improvement
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (CARBONDATA-3808) Add documentation for cdc and scd scenarios

2020-05-15 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh reopened CARBONDATA-3808:
--

> Add documentation for cdc and scd scenarios
> ---
>
> Key: CARBONDATA-3808
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3808
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3808) Add documentation for cdc and scd scenarios

2020-05-15 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh resolved CARBONDATA-3808.
--
Resolution: Fixed

> Add documentation for cdc and scd scenarios
> ---
>
> Key: CARBONDATA-3808
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3808
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3818) MV creation for table already having mv with same query doesnt throw error

2020-05-12 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3818:


 Summary: MV creation for table already having mv with same query 
doesnt throw error
 Key: CARBONDATA-3818
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3818
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3813) Add Example Class for MERGE syntax and update doc

2020-05-09 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3813:


 Summary: Add Example Class for MERGE syntax and update doc
 Key: CARBONDATA-3813
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3813
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3808) Add documentation for cdc and scd scenarios

2020-05-07 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3808:


 Summary: Add documentation for cdc and scd scenarios
 Key: CARBONDATA-3808
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3808
 Project: CarbonData
  Issue Type: Improvement
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3801) Query on partition table with SI having multiple partiton columns gives empty results

2020-05-06 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3801:


 Summary: Query on partition table with SI having multiple partiton 
columns gives empty results
 Key: CARBONDATA-3801
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3801
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3800) Load data to SI and MV after insert stage command

2020-05-06 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3800:


 Summary: Load data to SI and MV after insert stage command 
 Key: CARBONDATA-3800
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3800
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3781) Refactor code to optimize partition pruning

2020-04-23 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3781:


 Summary: Refactor code to optimize partition pruning
 Key: CARBONDATA-3781
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3781
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3773) Skip Validate partition info in Indexserver count star flow

2020-04-16 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3773:


 Summary: Skip Validate partition info in Indexserver count star 
flow
 Key: CARBONDATA-3773
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3773
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3768) Fix query not hitting mv without alias, with mv having Alias

2020-04-08 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3768:


 Summary: Fix query not hitting mv without alias, with mv having 
Alias
 Key: CARBONDATA-3768
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3768
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3765) Refactor Index Metadata for CG and FG Indexes

2020-04-03 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3765:


 Summary: Refactor Index Metadata for CG and FG Indexes
 Key: CARBONDATA-3765
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3765
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3762) Block creating materilaized views with duplicate column

2020-04-03 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3762:


 Summary: Block creating materilaized views with duplicate column
 Key: CARBONDATA-3762
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3762
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3750) Support segment level MInMax for Secondary Index

2020-03-18 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3750:


 Summary: Support segment level MInMax for Secondary Index
 Key: CARBONDATA-3750
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3750
 Project: CarbonData
  Issue Type: Sub-task
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3741) Fix ParseException from hive during ALTER SET TBLPROERTIES if database name starts with Underscore

2020-03-15 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3741:
-
Description: 
Queries:

drop database if exists _default cascade;
 create database _default;
 create table _default.OneRowTable(col1 string, col2 string, col3 int, col4 
double) STORED AS carbondata;
 insert into _default.OneRowTable select * from _default.OneRowTable;

 

Check the logs and find the exception:

NoViableAltException(13@[192:1: tableName : (db= identifier DOT tab= identifier 
-> ^( TOK_TABNAME $db $tab) |tab= identifier -> ^( TOK_TABNAME $tab) );])
 at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
 at org.antlr.runtime.DFA.predict(DFA.java:144)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.tableName(HiveParser_FromClauseParser.java:4747)
 at org.apache.hadoop.hive.ql.parse.HiveParser.tableName(HiveParser.java:45920)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser.alterStatement(HiveParser.java:7394)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2685)
 at 
org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1650)
 at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1109)
 at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
 at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:396)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
 at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:718)
 at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:707)
 at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:275)
 at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:213)
 at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:212)
 at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:258)
 at 
org.apache.spark.sql.hive.client.HiveClientImpl.runHive(HiveClientImpl.scala:707)
 at 
org.apache.spark.sql.hive.client.HiveClientImpl.runSqlHive(HiveClientImpl.scala:697)
 at 
org.apache.spark.sql.hive.CarbonSessionCatalogUtil$.alterTable(CarbonSessionCatalogUtil.scala:75)
 at 
org.apache.spark.sql.secondaryindex.util.CarbonInternalScalaUtil$.addOrModifyTableProperty(CarbonInternalScalaUtil.scala:367)
 at 
org.apache.spark.sql.secondaryindex.hive.CarbonInternalMetastore$.refreshIndexInfo(CarbonInternalMetastore.scala:180)
 at 
org.apache.spark.sql.secondaryindex.events.CreateCarbonRelationEventListener.onEvent(CreateCarbonRelationEventListener.scala:46)
 at 
org.apache.carbondata.events.OperationListenerBus.fireEvent(OperationListenerBus.java:83)
 at 
org.apache.spark.sql.hive.CarbonFileMetastore.readCarbonSchema(CarbonFileMetastore.scala:159)
 at 
org.apache.spark.sql.hive.CarbonFileMetastore.createCarbonRelation(CarbonFileMetastore.scala:139)
 at 
org.apache.spark.sql.CarbonDatasourceHadoopRelation.carbonRelation$lzycompute(CarbonDatasourceHadoopRelation.scala:60)
 at 
org.apache.spark.sql.CarbonDatasourceHadoopRelation.carbonRelation(CarbonDatasourceHadoopRelation.scala:58)
 at 
org.apache.spark.sql.hive.CarbonPreInsertionCasts.castChildOutput(CarbonAnalysisRules.scala:279)
 at 
org.apache.spark.sql.hive.CarbonPreInsertionCasts$$anonfun$apply$2.applyOrElse(CarbonAnalysisRules.scala:271)
 at 
org.apache.spark.sql.hive.CarbonPreInsertionCasts$$anonfun$apply$2.applyOrElse(CarbonAnalysisRules.scala:265)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:259)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:259)
 at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:258)
 at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformDown(AnalysisHelper.scala:149)
 at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
 at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
 at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:248)
 at 
org.apache.spark.sql.hive.CarbonPreInsertionCasts.apply(CarbonAnalysisRules.scala:265)
 at 

[jira] [Created] (CARBONDATA-3741) Fix ParseException from hive during ALTER SET TBLPROERTIES if database name starts with Underscore

2020-03-15 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3741:


 Summary: Fix ParseException from hive during ALTER SET 
TBLPROERTIES if database name starts with Underscore
 Key: CARBONDATA-3741
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3741
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3739) Select with order by columns not in projection gives wrong results

2020-03-06 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3739:


 Summary: Select with order by columns not in projection gives 
wrong results
 Key: CARBONDATA-3739
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3739
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3665) Support TimeBased Cache expiration using ExpiringMap

2020-03-05 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3665:
-
Summary: Support TimeBased Cache expiration using ExpiringMap  (was: 
Support TimeBased Cache expiration using Guava Cache)

> Support TimeBased Cache expiration using ExpiringMap
> 
>
> Key: CARBONDATA-3665
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3665
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3733) MV with Limit queries with simple projection gives incorrect results

2020-03-02 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3733:


 Summary: MV with Limit queries with simple projection gives 
incorrect results
 Key: CARBONDATA-3733
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3733
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3732) ClassCastException is thrown while running queries with cast of rand() udf function

2020-03-02 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3732:


 Summary: ClassCastException is thrown while running queries with 
cast of  rand() udf function
 Key: CARBONDATA-3732
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3732
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3718) Support SegmentLevel MinMax for better Pruning and less driver memory usage for cache

2020-02-21 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3718:


 Summary: Support SegmentLevel MinMax for better Pruning and less 
driver memory usage for cache
 Key: CARBONDATA-3718
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3718
 Project: CarbonData
  Issue Type: New Feature
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3715) Fix Timeseries Query Rollup failure for timeseries column of Date type

2020-02-20 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3715:


 Summary: Fix Timeseries Query Rollup failure for timeseries column 
of Date type
 Key: CARBONDATA-3715
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3715
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3680) Support secondary index on carbon table

2020-02-05 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3680:
-
Attachment: Secondary Index.pdf

> Support secondary index on carbon table
> ---
>
> Key: CARBONDATA-3680
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3680
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
> Attachments: Secondary Index.pdf
>
>
> Currently we have datamaps like, *default datamaps* which are block and 
> blocklet and *coarse grained datamaps* like bloom, and *fine grained 
> datamaps* like lucene
> which helps in better pruning during query. What if we introduce another kind 
> of datamap which can hold blockletId as index? Initial level, we call it as 
> index which
> will work as a child table to the main table like we have MV in our current 
> code.
> Yes, lets introduce the secondary index to carbon table which will be the 
> child table to main table and it can be created on column like we create 
> lucene datamap,
> where we give index columns to create index. In a similar way, we create 
> secondary index on column, so indexes on these column will be blocklet IDs 
> which will
> help in better pruning and faster query when we have a filter query on the 
> index column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3680) Support secondary index on carbon table

2020-02-05 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3680:
-
Attachment: (was: SI doc.pdf)

> Support secondary index on carbon table
> ---
>
> Key: CARBONDATA-3680
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3680
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
> Attachments: Secondary Index.pdf
>
>
> Currently we have datamaps like, *default datamaps* which are block and 
> blocklet and *coarse grained datamaps* like bloom, and *fine grained 
> datamaps* like lucene
> which helps in better pruning during query. What if we introduce another kind 
> of datamap which can hold blockletId as index? Initial level, we call it as 
> index which
> will work as a child table to the main table like we have MV in our current 
> code.
> Yes, lets introduce the secondary index to carbon table which will be the 
> child table to main table and it can be created on column like we create 
> lucene datamap,
> where we give index columns to create index. In a similar way, we create 
> secondary index on column, so indexes on these column will be blocklet IDs 
> which will
> help in better pruning and faster query when we have a filter query on the 
> index column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3680) Support secondary index on carbon table

2020-02-05 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3680:
-
Attachment: SI doc.pdf

> Support secondary index on carbon table
> ---
>
> Key: CARBONDATA-3680
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3680
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
> Attachments: SI doc.pdf
>
>
> Currently we have datamaps like, *default datamaps* which are block and 
> blocklet and *coarse grained datamaps* like bloom, and *fine grained 
> datamaps* like lucene
> which helps in better pruning during query. What if we introduce another kind 
> of datamap which can hold blockletId as index? Initial level, we call it as 
> index which
> will work as a child table to the main table like we have MV in our current 
> code.
> Yes, lets introduce the secondary index to carbon table which will be the 
> child table to main table and it can be created on column like we create 
> lucene datamap,
> where we give index columns to create index. In a similar way, we create 
> secondary index on column, so indexes on these column will be blocklet IDs 
> which will
> help in better pruning and faster query when we have a filter query on the 
> index column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3680) Support secondary index on carbon table

2020-02-05 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3680:
-
Description: 
Currently we have datamaps like, *default datamaps* which are block and 
blocklet and *coarse grained datamaps* like bloom, and *fine grained datamaps* 
like lucene
which helps in better pruning during query. What if we introduce another kind 
of datamap which can hold blockletId as index? Initial level, we call it as 
index which
will work as a child table to the main table like we have MV in our current 
code.

Yes, lets introduce the secondary index to carbon table which will be the child 
table to main table and it can be created on column like we create lucene 
datamap,
where we give index columns to create index. In a similar way, we create 
secondary index on column, so indexes on these column will be blocklet IDs 
which will
help in better pruning and faster query when we have a filter query on the 
index column.

> Support secondary index on carbon table
> ---
>
> Key: CARBONDATA-3680
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3680
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
>
> Currently we have datamaps like, *default datamaps* which are block and 
> blocklet and *coarse grained datamaps* like bloom, and *fine grained 
> datamaps* like lucene
> which helps in better pruning during query. What if we introduce another kind 
> of datamap which can hold blockletId as index? Initial level, we call it as 
> index which
> will work as a child table to the main table like we have MV in our current 
> code.
> Yes, lets introduce the secondary index to carbon table which will be the 
> child table to main table and it can be created on column like we create 
> lucene datamap,
> where we give index columns to create index. In a similar way, we create 
> secondary index on column, so indexes on these column will be blocklet IDs 
> which will
> help in better pruning and faster query when we have a filter query on the 
> index column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3680) Support secondary index on carbon table

2020-02-05 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3680:


 Summary: Support secondary index on carbon table
 Key: CARBONDATA-3680
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3680
 Project: CarbonData
  Issue Type: New Feature
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3665) Support TimeBased Cache expiration using Guava Cache

2020-01-16 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3665:


 Summary: Support TimeBased Cache expiration using Guava Cache
 Key: CARBONDATA-3665
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3665
 Project: CarbonData
  Issue Type: New Feature
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3658) Prune and Cache only Matched partitions for filter on Partitioned table

2020-01-08 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3658:


 Summary: Prune and Cache only Matched partitions for filter on 
Partitioned table 
 Key: CARBONDATA-3658
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3658
 Project: CarbonData
  Issue Type: Improvement
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-2840) Add SDV testcases for Complex DataType Support

2020-01-06 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh resolved CARBONDATA-2840.
--
Resolution: Fixed

> Add SDV testcases for Complex DataType Support
> --
>
> Key: CARBONDATA-2840
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2840
> Project: CarbonData
>  Issue Type: Test
>Reporter: Indhumathi Muthumurugesh
>Assignee: Indhumathi Muthumurugesh
>Priority: Minor
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-2794) Code Generator Error is thrown when Select filter contains more than one count of distinct of ArrayOfStruct with group by Clause

2020-01-06 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh resolved CARBONDATA-2794.
--
Resolution: Fixed

> Code Generator Error is thrown when Select filter contains more than one 
> count of distinct of ArrayOfStruct with group by Clause
> 
>
> Key: CARBONDATA-2794
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2794
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Indhumathi Muthumurugesh
>Assignee: Indhumathi Muthumurugesh
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3645) BadRecords are inserted as NULL when column is of complex data type and BAD_RECORDS_ACTION is IGNORE

2019-12-31 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3645:
-
Summary: BadRecords are inserted as NULL when column is of complex data 
type and BAD_RECORDS_ACTION is IGNORE  (was: Bad record data is inserted as 
NULL if column datatype is of complex type and BAD_RECORDS_ACTION is IGNORE)

> BadRecords are inserted as NULL when column is of complex data type and 
> BAD_RECORDS_ACTION is IGNORE
> 
>
> Key: CARBONDATA-3645
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3645
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3645) Bad record data is inserted as NULL if column datatype is of complex type and BAD_RECORDS_ACTION is IGNORE

2019-12-31 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3645:


 Summary: Bad record data is inserted as NULL if column datatype is 
of complex type and BAD_RECORDS_ACTION is IGNORE
 Key: CARBONDATA-3645
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3645
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3600) Fix creating mv timeseries UDF column as partition column

2019-12-30 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3600:
-
Description: 
Problem:
Issue 1:
When trying to create datamap with partition column in timeseries udf, throws 
Exception.
Issue 2:
When Create datamap was in progress, Jdbc application is killed. When 
restarting, datamap table not found exception is thrown.

> Fix creating mv timeseries UDF column as partition column
> -
>
> Key: CARBONDATA-3600
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3600
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Priority: Minor
> Fix For: 2.0.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Problem:
> Issue 1:
> When trying to create datamap with partition column in timeseries udf, throws 
> Exception.
> Issue 2:
> When Create datamap was in progress, Jdbc application is killed. When 
> restarting, datamap table not found exception is thrown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3636) Timeseries query is not hitting datamap if granularity in query is case insensitive

2019-12-28 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3636:


 Summary: Timeseries query is not hitting datamap if granularity in 
query is case insensitive 
 Key: CARBONDATA-3636
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3636
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3629) Fix Select query failure on aggregation of same column on MV

2019-12-24 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3629:


 Summary: Fix Select query failure on aggregation of same column on 
MV
 Key: CARBONDATA-3629
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3629
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3624) Support creating MV datamap without giving filter columns in projection and bug fixes

2019-12-18 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3624:


 Summary: Support creating MV datamap without giving filter columns 
in projection and bug fixes
 Key: CARBONDATA-3624
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3624
 Project: CarbonData
  Issue Type: Improvement
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3601) Show Segments displays wrong Index size for Partition table with Merge Index Enabled

2019-12-03 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3601:


 Summary: Show Segments displays wrong Index size for Partition 
table with Merge Index Enabled
 Key: CARBONDATA-3601
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3601
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3600) Fix creating mv timeseries UDF column as partition column

2019-12-03 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3600:


 Summary: Fix creating mv timeseries UDF column as partition column
 Key: CARBONDATA-3600
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3600
 Project: CarbonData
  Issue Type: Bug
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3594) Optimize getSplits() during compaction

2019-11-24 Thread Indhumathi Muthumurugesh (Jira)
Indhumathi Muthumurugesh created CARBONDATA-3594:


 Summary: Optimize getSplits() during compaction
 Key: CARBONDATA-3594
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3594
 Project: CarbonData
  Issue Type: Improvement
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3584) Select Query fails for Boolean dictionary column when Codegen is false

2019-11-15 Thread Indhumathi Muthumurugesh (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Indhumathi Muthumurugesh updated CARBONDATA-3584:
-
Attachment: codegen.png

> Select Query fails for Boolean dictionary column when Codegen is false
> --
>
> Key: CARBONDATA-3584
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3584
> Project: CarbonData
>  Issue Type: Test
>Reporter: Indhumathi Muthumurugesh
>Priority: Major
> Attachments: codegen.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >