[jira] [Updated] (CARBONDATA-4243) Select filter query with to_date in filter fails for table with column_meta_cache configured also having SI
[ https://issues.apache.org/jira/browse/CARBONDATA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat updated CARBONDATA-4243: Environment: Spark 3.1.1, Spark 2.4.5 (was: Spark 3.1.1) > Select filter query with to_date in filter fails for table with > column_meta_cache configured also having SI > --- > > Key: CARBONDATA-4243 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4243 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 2.2.0 > Environment: Spark 3.1.1, Spark 2.4.5 >Reporter: Chetan Bhat >Priority: Minor > > Create table with column_meta_cache, create secondary indexes and load data > to table. > Execute the Select filter query with to_date in filter. > CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,36),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) stored as carbondata > TBLPROPERTIES('COLUMN_META_CACHE'='CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ'); > CREATE INDEX indextable2 ON TABLE uniqdata (DOB) AS 'carbondata'; > CREATE INDEX indextable3 ON TABLE uniqdata (DOJ) AS 'carbondata'; > LOAD DATA INPATH 'hdfs://hacluster/chetan/2000_UniqData.csv' into table > uniqdata OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > > *Issue: Select filter query with to_date in filter fails for table with > column_meta_cache configured also having SI* > 0: jdbc:hive2://10.21.19.14:23040/default> select > max(to_date(DOB)),min(to_date(DOB)),count(to_date(DOB)) from uniqdata where > to_date(DOB)='1975-06-11' or to_date(Dn select > max(to_date(DOB)),min(to_date(DOB)),count(to_date(DOB)) from uniqdata where > to_date(DOB)='1975-06-11' or to_date(DOB)='1975-06-23'; > Error: org.apache.hive.service.cli.HiveSQLException: Error running query: > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: makeCopy, > tree: > !BroadCastSIFilterPushJoin [none#0|#0], [none#1|#1], Inner, BuildRight > :- *(6) ColumnarToRow > : +- Scan CarbonDatasourceHadoopRelation chetan.uniqdata[dob#847024|#847024] > Batched: true, DirectScan: false, PushedFilters: [((cast(input[0] as date) = > 1987) or (cast(in9))], ReadSchema: [dob] > +- *(8) HashAggregate(keys=[positionReference#847161|#847161], functions=[], > output=[positionReference#847161|#847161]) > +- ReusedExchange [positionReference#847161|#847161], Exchange > hashpartitioning(positionReference#847161, 200), ENSURE_REQUIREMENTS, > [id=#195473|#195473] > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(Sparation.scala:361) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78) > at > org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: > makeCopy, tree: > !BroadCastSIFilterPushJoin [none#0|#0], [none#1|#1], Inner, BuildRight > :- *(6)
[jira] [Reopened] (CARBONDATA-4241) if the sort scope is changed to global sort and data loaded, major compaction fails
[ https://issues.apache.org/jira/browse/CARBONDATA-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat reopened CARBONDATA-4241: - Issue still exists with Spark 3.1.1 jars in https://dist.apache.org/repos/dist/release/carbondata/2.2.0/ > if the sort scope is changed to global sort and data loaded, major compaction > fails > --- > > Key: CARBONDATA-4241 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4241 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 2.2.0 > Environment: Spark 2.3.2 Carbon 1.6.1 , Spark 3.1.1 Carbon 2.2.0 >Reporter: Chetan Bhat >Assignee: Indhumathi Muthumurugesh >Priority: Major > Fix For: 2.2.0 > > > *Scenario 1 : create table with table_page_size_inmb'='1', load data ,* *set > sortscope as global sort , load data and do major compaction.*** > 0: jdbc:hive2://10.21.19.14:23040/default> CREATE TABLE uniqdata_pagesize > (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ > timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 > decimal(30,10), DECIMAL_COLUMN2 decimal(36,36),Double_COLUMN1 double, > Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED as carbondata > TBLPROPERTIES('table_page_size_inmb'='1'); > +-+ > | Result | > +-+ > +-+ > No rows selected (0.229 seconds) > 0: jdbc:hive2://10.21.19.14:23040/default> LOAD DATA INPATH > 'hdfs://hacluster/chetan/2000_UniqData.csv' into table uniqdata_pagesize > OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > +-+ > | Segment ID | > +-+ > | 0 | > +-+ > 1 row selected (1.016 seconds) > 0: jdbc:hive2://10.21.19.14:23040/default> alter table uniqdata_pagesize set > tblproperties('sort_columns'='CUST_ID','sort_scope'='global_sort'); > +-+ > | Result | > +-+ > +-+ > No rows selected (0.446 seconds) > 0: jdbc:hive2://10.21.19.14:23040/default> LOAD DATA INPATH > 'hdfs://hacluster/chetan/2000_UniqData.csv' into table uniqdata_pagesize > OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > +-+ > | Segment ID | > +-+ > | 1 | > +-+ > 1 row selected (0.767 seconds) > 0: jdbc:hive2://10.21.19.14:23040/default> alter table uniqdata_pagesize > compact 'major'; > Error: org.apache.hive.service.cli.HiveSQLException: Error running query: > org.apache.spark.sql.AnalysisException: Compaction failed. Please check logs > for more info. Exception in compaction Compaction Failure in Merger Rdd. > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:361) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78) > at > org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoo
[jira] [Created] (CARBONDATA-4274) Create partition table error with spark 3.1
SHREELEKHYA GAMPA created CARBONDATA-4274: - Summary: Create partition table error with spark 3.1 Key: CARBONDATA-4274 URL: https://issues.apache.org/jira/browse/CARBONDATA-4274 Project: CarbonData Issue Type: Bug Reporter: SHREELEKHYA GAMPA With spark 3.1, we can create a partition table by giving partition columns from schema. Like below example: {{create table partitionTable(c1 int, c2 int, v1 string, v2 string) stored as carbondata partitioned by (v2,c2)}} When the table is created by SparkSession with CarbonExtension, catalog table is created with the specified partitions. But in cluster/ with carbon session, when we create partition table with above syntax it is creating normal table with no partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (CARBONDATA-4195) Materialized view loading time increased due to full refresh
[ https://issues.apache.org/jira/browse/CARBONDATA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403524#comment-17403524 ] Nihal kumar ojha edited comment on CARBONDATA-4195 at 8/24/21, 5:14 AM: Hi, can you please provide the create MV command? Based on that only MV will be created with incremental or full refresh. If your query contains avg() aggregate function or some expression like sum(col1) + sum(col2) then MV will be created with full refresh. So once we have that command then we can conclude. Or if it is a duplicate of [CARBONDATA-4239|https://issues.apache.org/jira/browse/CARBONDATA-4239] then please close this issue as we are already tracking that issue. was (Author: nihal): Hi, can you please provide the create MV command? Based on that only MV will be created with incremental or full refresh. If your query contains avg() aggregate function or some expression like sum(col1) + sum(col2) then MV will be created with full refresh. So once we have that command then we can conclude. > Materialized view loading time increased due to full refresh > > > Key: CARBONDATA-4195 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4195 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 2.1.0 >Reporter: Mayuri Patole >Priority: Major > Fix For: 2.1.0 > > > Hi Team, > We are using carbon 2.1.0 in our project where parallel data loading is > happening. > We are working on getting optimal performance for aggregated queries using > materialized views. > We observed that continues data loading and full refresh of MV is causing > increased load time and high memory usage which doesn't have to be this way. > Can you suggest a way to perform incremental refresh because we do not need > to calculate old data again while loading ? > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-4195) Materialized view loading time increased due to full refresh
[ https://issues.apache.org/jira/browse/CARBONDATA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403524#comment-17403524 ] Nihal kumar ojha commented on CARBONDATA-4195: -- Hi, can you please provide the create MV command? Based on that only MV will be created with incremental or full refresh. If your query contains avg() aggregate function or some expression like sum(col1) + sum(col2) then MV will be created with full refresh. So once we have that command then we can conclude. > Materialized view loading time increased due to full refresh > > > Key: CARBONDATA-4195 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4195 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 2.1.0 >Reporter: Mayuri Patole >Priority: Major > Fix For: 2.1.0 > > > Hi Team, > We are using carbon 2.1.0 in our project where parallel data loading is > happening. > We are working on getting optimal performance for aggregated queries using > materialized views. > We observed that continues data loading and full refresh of MV is causing > increased load time and high memory usage which doesn't have to be this way. > Can you suggest a way to perform incremental refresh because we do not need > to calculate old data again while loading ? > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] nihal0107 commented on issue #4206: Cannot create table with partitions in Spark in EMR
nihal0107 commented on issue #4206: URL: https://github.com/apache/carbondata/issues/4206#issuecomment-903767216 Hi, As you can see the error message is `partition is not supported for external table`. Whenever you create a table with location then it will be an external table and we are not supporting partition for the external table. Partition is only supported for the transactional table. please go through other details about partitions https://github.com/apache/carbondata/blob/master/docs/ddl-of-carbondata.md#partition -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@carbondata.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org