[jira] [Updated] (CARBONDATA-4243) Select filter query with to_date in filter fails for table with column_meta_cache configured also having SI

2021-08-23 Thread Chetan Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Bhat updated CARBONDATA-4243:

Environment: Spark 3.1.1, Spark 2.4.5  (was: Spark 3.1.1)

> Select filter query with to_date in filter fails for table with 
> column_meta_cache configured also having SI
> ---
>
> Key: CARBONDATA-4243
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4243
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.2.0
> Environment: Spark 3.1.1, Spark 2.4.5
>Reporter: Chetan Bhat
>Priority: Minor
>
> Create table with column_meta_cache, create secondary indexes and load data 
> to table. 
> Execute the Select filter query with to_date in filter.
> CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION 
> string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 
> bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 
> decimal(36,36),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 
> int) stored as carbondata 
> TBLPROPERTIES('COLUMN_META_CACHE'='CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ');
>  CREATE INDEX indextable2 ON TABLE uniqdata (DOB) AS 'carbondata';
>  CREATE INDEX indextable3 ON TABLE uniqdata (DOJ) AS 'carbondata';
>  LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table 
> uniqdata OPTIONS('DELIMITER'=',' , 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
>  
> *Issue: Select filter query with to_date in filter fails for table with 
> column_meta_cache configured also having SI*
> 0: jdbc:hive2://10.21.19.14:23040/default> select 
> max(to_date(DOB)),min(to_date(DOB)),count(to_date(DOB)) from uniqdata where 
> to_date(DOB)='1975-06-11' or to_date(Dn select 
> max(to_date(DOB)),min(to_date(DOB)),count(to_date(DOB)) from uniqdata where 
> to_date(DOB)='1975-06-11' or to_date(DOB)='1975-06-23';
>  Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: makeCopy, 
> tree:
>  !BroadCastSIFilterPushJoin [none#0|#0], [none#1|#1], Inner, BuildRight
>  :- *(6) ColumnarToRow
>  : +- Scan CarbonDatasourceHadoopRelation chetan.uniqdata[dob#847024|#847024] 
> Batched: true, DirectScan: false, PushedFilters: [((cast(input[0] as date) = 
> 1987) or (cast(in9))], ReadSchema: [dob]
>  +- *(8) HashAggregate(keys=[positionReference#847161|#847161], functions=[], 
> output=[positionReference#847161|#847161])
>  +- ReusedExchange [positionReference#847161|#847161], Exchange 
> hashpartitioning(positionReference#847161, 200), ENSURE_REQUIREMENTS, 
> [id=#195473|#195473]
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(Sparation.scala:361)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
> makeCopy, tree:
>  !BroadCastSIFilterPushJoin [none#0|#0], [none#1|#1], Inner, BuildRight
>  :- *(6)

[jira] [Reopened] (CARBONDATA-4241) if the sort scope is changed to global sort and data loaded, major compaction fails

2021-08-23 Thread Chetan Bhat (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Bhat reopened CARBONDATA-4241:
-

Issue still exists with Spark 3.1.1 jars in 
https://dist.apache.org/repos/dist/release/carbondata/2.2.0/

> if the sort scope is changed to global sort and data loaded, major compaction 
> fails
> ---
>
> Key: CARBONDATA-4241
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4241
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 2.2.0
> Environment: Spark 2.3.2 Carbon 1.6.1 , Spark 3.1.1 Carbon 2.2.0
>Reporter: Chetan Bhat
>Assignee: Indhumathi Muthumurugesh
>Priority: Major
> Fix For: 2.2.0
>
>
> *Scenario 1 :  create table with table_page_size_inmb'='1', load data ,* *set 
> sortscope as global sort , load data and do major compaction.***
> 0: jdbc:hive2://10.21.19.14:23040/default> CREATE TABLE uniqdata_pagesize 
> (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ 
> timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 
> decimal(30,10), DECIMAL_COLUMN2 decimal(36,36),Double_COLUMN1 double, 
> Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED as carbondata 
> TBLPROPERTIES('table_page_size_inmb'='1');
> +-+
> | Result  |
> +-+
> +-+
> No rows selected (0.229 seconds)
> 0: jdbc:hive2://10.21.19.14:23040/default> LOAD DATA INPATH 
> 'hdfs://hacluster/chetan/2000_UniqData.csv' into table uniqdata_pagesize 
> OPTIONS('DELIMITER'=',' , 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
> +-+
> | Segment ID  |
> +-+
> | 0   |
> +-+
> 1 row selected (1.016 seconds)
> 0: jdbc:hive2://10.21.19.14:23040/default> alter table uniqdata_pagesize set 
> tblproperties('sort_columns'='CUST_ID','sort_scope'='global_sort');
> +-+
> | Result  |
> +-+
> +-+
> No rows selected (0.446 seconds)
> 0: jdbc:hive2://10.21.19.14:23040/default> LOAD DATA INPATH 
> 'hdfs://hacluster/chetan/2000_UniqData.csv' into table uniqdata_pagesize 
> OPTIONS('DELIMITER'=',' , 
> 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1');
> +-+
> | Segment ID  |
> +-+
> | 1   |
> +-+
> 1 row selected (0.767 seconds)
> 0: jdbc:hive2://10.21.19.14:23040/default> alter table uniqdata_pagesize 
> compact 'major';
> Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
> org.apache.spark.sql.AnalysisException: Compaction failed. Please check logs 
> for more info. Exception in compaction Compaction Failure in Merger Rdd.
>     at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:361)
>     at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263)
>     at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>     at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
>     at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
>     at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43)
>     at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263)
>     at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
>     at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272)
>     at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoo

[jira] [Created] (CARBONDATA-4274) Create partition table error with spark 3.1

2021-08-23 Thread SHREELEKHYA GAMPA (Jira)
SHREELEKHYA GAMPA created CARBONDATA-4274:
-

 Summary:  Create partition table error with spark 3.1
 Key: CARBONDATA-4274
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4274
 Project: CarbonData
  Issue Type: Bug
Reporter: SHREELEKHYA GAMPA


With spark 3.1, we can create a partition table by giving partition columns 
from schema.
Like below example:
{{create table partitionTable(c1 int, c2 int, v1 string, v2 string) stored as 
carbondata partitioned by (v2,c2)}}

When the table is created by SparkSession with CarbonExtension, catalog table 
is created with the specified partitions.
But in cluster/ with carbon session, when we create partition table with above 
syntax it is creating normal table with no partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CARBONDATA-4195) Materialized view loading time increased due to full refresh

2021-08-23 Thread Nihal kumar ojha (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403524#comment-17403524
 ] 

Nihal kumar ojha edited comment on CARBONDATA-4195 at 8/24/21, 5:14 AM:


 Hi, can you please provide the create MV command? 

Based on that only MV will be created with incremental or full refresh. If your 
query contains avg() aggregate function or some expression like sum(col1) + 
sum(col2) then MV will be created with full refresh. So once we have that 
command then we can conclude.

Or if it is a duplicate of 
[CARBONDATA-4239|https://issues.apache.org/jira/browse/CARBONDATA-4239] then 
please close this issue as we are already tracking that issue.


was (Author: nihal):
 Hi, can you please provide the create MV command? 

Based on that only MV will be created with incremental or full refresh. If your 
query contains avg() aggregate function or some expression like sum(col1) + 
sum(col2) then MV will be created with full refresh. So once we have that 
command then we can conclude.

> Materialized view loading time increased due to full refresh
> 
>
> Key: CARBONDATA-4195
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4195
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0
>Reporter: Mayuri Patole
>Priority: Major
> Fix For: 2.1.0
>
>
> Hi Team,
> We are using carbon 2.1.0 in our project where parallel data loading is 
> happening.
> We are working on getting optimal performance for aggregated queries using 
> materialized views.
> We observed that continues data loading and full refresh of MV is causing 
> increased load time and high memory usage which doesn't have to be this way.
> Can you suggest a way to perform incremental refresh because we do not need 
> to calculate old data again while loading ? 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4195) Materialized view loading time increased due to full refresh

2021-08-23 Thread Nihal kumar ojha (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403524#comment-17403524
 ] 

Nihal kumar ojha commented on CARBONDATA-4195:
--

 Hi, can you please provide the create MV command? 

Based on that only MV will be created with incremental or full refresh. If your 
query contains avg() aggregate function or some expression like sum(col1) + 
sum(col2) then MV will be created with full refresh. So once we have that 
command then we can conclude.

> Materialized view loading time increased due to full refresh
> 
>
> Key: CARBONDATA-4195
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4195
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0
>Reporter: Mayuri Patole
>Priority: Major
> Fix For: 2.1.0
>
>
> Hi Team,
> We are using carbon 2.1.0 in our project where parallel data loading is 
> happening.
> We are working on getting optimal performance for aggregated queries using 
> materialized views.
> We observed that continues data loading and full refresh of MV is causing 
> increased load time and high memory usage which doesn't have to be this way.
> Can you suggest a way to perform incremental refresh because we do not need 
> to calculate old data again while loading ? 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] nihal0107 commented on issue #4206: Cannot create table with partitions in Spark in EMR

2021-08-23 Thread GitBox


nihal0107 commented on issue #4206:
URL: https://github.com/apache/carbondata/issues/4206#issuecomment-903767216


   Hi, As you can see the error message is `partition is not supported for 
external table`.
   Whenever you create a table with location then it will be an external table 
and we are not supporting partition for the external table. Partition is only 
supported for the transactional table. please go through other details about 
partitions
   
https://github.com/apache/carbondata/blob/master/docs/ddl-of-carbondata.md#partition


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org