[jira] [Created] (CARBONDATA-4228) Deleted records are reappearing in the select queries from the Alter added carbon segments after delete,update opertions.
Prasanna Ravichandran created CARBONDATA-4228: - Summary: Deleted records are reappearing in the select queries from the Alter added carbon segments after delete,update opertions. Key: CARBONDATA-4228 URL: https://issues.apache.org/jira/browse/CARBONDATA-4228 Project: CarbonData Issue Type: Bug Affects Versions: 1.6.1 Reporter: Prasanna Ravichandran Deleted records are not deleting and displaying in the select queries from the Alter added carbon segments after delete, update operations. Test queries: drop table uniqdata; CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata; load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); --Create a copy of files from the first seg; --hdfs dfs -rm -r -f /uniq1/*; --hdfs dfs -mkdir -p /uniq1/ --hdfs dfs -cp /user/hive/warehouse/carbon.store/rps/uniqdata/Fact/Part0/Segment_0/* /uniq1/; --hdfs dfs -ls /uniq1/; use rps; Alter table uniqdata add segment options ('path'='hdfs://hacluster/uniq1/','format'='carbon'); --update and delete works fine without throwing error but it wont work on the added carbon segments; delete from uniqdata where cust_id=9001; update uniqdata set (cust_name)=('Rahu') where cust_id=1; set carbon.input.segments.rps.uniqdata=1; --First segment represent the added segment; select cust_name from uniqdata where cust_id=1;--CUST_NAME_01000 - incorrect value should be Rahu; select count(*) from uniqdata where cust_id=9001;--returns 1 - incorrect, should be 0 as 9001 cust_id records are deleted through Delete DDL; reset; Console: > Alter table uniqdata add segment options > ('path'='hdfs://hacluster/uniq1/','format'='carbon'); +-+ | Result | +-+ +-+ No rows selected (1.226 seconds) > > delete from uniqdata where cust_id=9001; INFO : Execution ID: 139 ++ | Deleted Row Count | ++ | 2 | ++ 1 row selected (5.321 seconds) > update uniqdata set (cust_name)=('Rahu') where cust_id=1; INFO : Execution ID: 142 ++ | Updated Row Count | ++ | 2 | ++ 1 row selected (7.938 seconds) > > > set carbon.input.segments.rps.uniqdata=1; +-++ | key | value | +-++ | carbon.input.segments.rps.uniqdata | 1 | +-++ 1 row selected (0.05 seconds) > --First segment represent the added segment; > select cust_name from uniqdata where cust_id=1;--CUST_NAME_01000 - > incorrect value should be Rahu; INFO : Execution ID: 147 +--+ | cust_name | +--+ | CUST_NAME_01000 | +--+ 1 row selected (0.93 seconds) > select count(*) from uniqdata where cust_id=9001;--returns 1 - incorrect, > should be 0 as 9001 cust_id records are deleted through Delete DDL; INFO : Execution ID: 148 +---+ | count(1) | +---+ | 1 | +---+ 1 row selected (1.149 seconds) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4215) When carbon.enable.vector.reader=false and upon adding a parquet segment through alter add segments in a carbon table , we are getting error in count(*)
Prasanna Ravichandran created CARBONDATA-4215: - Summary: When carbon.enable.vector.reader=false and upon adding a parquet segment through alter add segments in a carbon table , we are getting error in count(*) Key: CARBONDATA-4215 URL: https://issues.apache.org/jira/browse/CARBONDATA-4215 Project: CarbonData Issue Type: Bug Affects Versions: 2.1.1 Environment: 3 node FI Reporter: Prasanna Ravichandran When carbon.enable.vector.reader=false and upon adding a parquet segment through alter add segments in a carbon table , we are getting error in count(*). Test queries: --set carbon.enable.vector.reader=false in carbon.properties; use default; drop table if exists uniqdata; CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata; load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); drop table if exists uniqdata_parquet; CREATE TABLE uniqdata_parquet (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as parquet; insert into uniqdata_parquet select * from uniqdata; create database if not exists test; use test; CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata; load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); Alter table uniqdata add segment options ('path'='hdfs://hacluster/user/hive/warehouse/uniqdata_parquet','format'='parquet'); select count(*) from uniqdata; -- throwing error class cast exception; Error Log traces: java.lang.ClassCastException: org.apache.spark.sql.vectorized.ColumnarBatch cannot be cast to org.apache.spark.sql.catalyst.InternalRow at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithoutKey_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:584) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:132) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:58) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:413) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1551) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:419) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2021-06-19 13:50:59,035 | WARN | task-result-getter-2 | Lost task 0.0 in stage 4.0 (TID 28, localhost, executor driver): java.lang.ClassCastException: org.apache.spark.sql.vectorized.ColumnarBatch cannot be cast to org.apache.spark.sql.catalyst.InternalRow at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithoutKey_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at
[jira] [Created] (CARBONDATA-4204) When the path is empty in Carbon add segments then "String Index out of range" error is thrown.
Prasanna Ravichandran created CARBONDATA-4204: - Summary: When the path is empty in Carbon add segments then "String Index out of range" error is thrown. Key: CARBONDATA-4204 URL: https://issues.apache.org/jira/browse/CARBONDATA-4204 Project: CarbonData Issue Type: Bug Affects Versions: 2.1.1 Environment: 3 node FI cluster Reporter: Prasanna Ravichandran Test queries: CREATE TABLE uniqdata(cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata; load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); Alter table uniqdata add segment options ('path'='','format'='carbon'); -- Error: org.apache.hive.service.cli.HiveSQLException: Error running query: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:396) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$3(SparkExecuteStatementOperation.scala:281) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78) at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:46) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:281) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1761) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:295) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.charAt(String.java:658) at org.apache.spark.sql.execution.command.management.CarbonAddLoadCommand.processMetadata(CarbonAddLoadCommand.scala:93) at org.apache.spark.sql.execution.command.MetadataCommand.$anonfun$run$1(package.scala:137) at org.apache.spark.sql.execution.command.Auditable.runWithAudit(package.scala:118) at org.apache.spark.sql.execution.command.Auditable.runWithAudit$(package.scala:114) at org.apache.spark.sql.execution.command.MetadataCommand.runWithAudit(package.scala:134) at org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:137) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:71) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:69) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:80) at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:231) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3697) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:108) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:170) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:91) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:777) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3695) at org.apache.spark.sql.Dataset.(Dataset.scala:231) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:777) at
[jira] [Updated] (CARBONDATA-4203) Compaction in SDK segments added is causing compaction issue after update, delete operations.
[ https://issues.apache.org/jira/browse/CARBONDATA-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-4203: -- Description: Compaction in SDK segments added through add segments is causing compaction issue after update, delete operations. This issue is present only when delete and update happens on one of the added segment. This issue is not seen without delete and update on 1 segment. Place the attached SDK files in the /sdkfiles/primitive/,/sdkfiles/primitive2/, /sdkfiles/primitive3/,/sdkfiles/primitive4/ and /sdkfiles/primitive5/ folders in HDFS and then execute the below queries. Test queries: drop table if exists external_primitive; create table external_primitive (id int, name string, rank smallint, salary double, active boolean, dob date, doj timestamp, city string, dept string) stored as carbondata; insert into external_primitive select 1,"Pr",1,10,true,"1992-12-09","1992-10-07 22:00:20.0","chennai","CSE"; alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive','format'='carbon'); delete from external_primitive where id =2; update external_primitive set (name)=("RAMU") where name="CCC"; drop table if exists external_primitive; create table external_primitive (id int, name string, rank smallint, salary double, active boolean, dob date, doj timestamp, city string, dept string) stored as carbondata; alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive','format'='carbon'); alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon'); alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon'); alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive4','format'='carbon'); alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive5','format'='carbon'); alter table external_primitive compact 'minor'; !image-2021-06-08-16-54-52-412.png! Error traces: Error: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.AnalysisException: Compaction failed. Please check logs for more info. Exception in compaction Compaction Failure in Merger Rdd. at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:396) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$3(SparkExecuteStatementOperation.scala:281) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78) at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:46) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:281) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1761) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:295) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.sql.AnalysisException: Compaction failed. Please check logs for more info. Exception in compaction Compaction Failure in Merger Rdd. at org.apache.spark.sql.util.CarbonException$.analysisException(CarbonException.scala:23) at org.apache.spark.sql.execution.command.management.CarbonAlterTableCompactionCommand.$anonfun$processData$3(CarbonAlterTableCompactionCommand.scala:197) at org.apache.carbondata.events.package$.withEvents(package.scala:27) at org.apache.spark.sql.execution.command.management.CarbonAlterTableCompactionCommand.processData(CarbonAlterTableCompactionCommand.scala:185) at org.apache.spark.sql.execution.command.AtomicRunnableCommand.$anonfun$run$3(package.scala:162) at org.apache.spark.sql.execution.command.Auditable.runWithAudit(package.scala:118) at
[jira] [Updated] (CARBONDATA-4203) Compaction in SDK segments added is causing compaction issue after update, delete operations.
[ https://issues.apache.org/jira/browse/CARBONDATA-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-4203: -- Attachment: primitive- SDK files.rar > Compaction in SDK segments added is causing compaction issue after update, > delete operations. > - > > Key: CARBONDATA-4203 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4203 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.1.1 > Environment: FI cluster - 3 node >Reporter: Prasanna Ravichandran >Priority: Major > Attachments: primitive- SDK files.rar > > > Compaction in SDK segments added through add segments is causing compaction > issue after update, delete operations. This issue is present only when delete > and update happens on one of the added segment. This issue is not seen > without delete and update on 1 segment. > Test queries: > drop table if exists external_primitive; > create table external_primitive (id int, name string, rank smallint, salary > double, active boolean, dob date, doj timestamp, city string, dept string) > stored as carbondata; > insert into external_primitive select > 1,"Pr",1,10,true,"1992-12-09","1992-10-07 22:00:20.0","chennai","CSE"; > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive','format'='carbon'); > delete from external_primitive where id =2; > update external_primitive set (name)=("RAMU") where name="CCC"; > drop table if exists external_primitive; > create table external_primitive (id int, name string, rank smallint, salary > double, active boolean, dob date, doj timestamp, city string, dept string) > stored as carbondata; > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive4','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive5','format'='carbon'); > alter table external_primitive compact 'minor'; > > !image-2021-06-08-16-54-52-412.png! > Error traces: > Error: org.apache.hive.service.cli.HiveSQLException: Error running query: > org.apache.spark.sql.AnalysisException: Compaction failed. Please check logs > for more info. Exception in compaction Compaction Failure in Merger Rdd. > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:396) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$3(SparkExecuteStatementOperation.scala:281) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78) > at > org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:46) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:281) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:268) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1761) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:295) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.spark.sql.AnalysisException: Compaction failed. Please > check logs for more info. Exception in compaction Compaction Failure in > Merger Rdd. > at > org.apache.spark.sql.util.CarbonException$.analysisException(CarbonException.scala:23) > at >
[jira] [Created] (CARBONDATA-4203) Compaction in SDK segments added is causing compaction issue after update, delete operations.
Prasanna Ravichandran created CARBONDATA-4203: - Summary: Compaction in SDK segments added is causing compaction issue after update, delete operations. Key: CARBONDATA-4203 URL: https://issues.apache.org/jira/browse/CARBONDATA-4203 Project: CarbonData Issue Type: Bug Affects Versions: 2.1.1 Environment: FI cluster - 3 node Reporter: Prasanna Ravichandran Compaction in SDK segments added through add segments is causing compaction issue after update, delete operations. This issue is present only when delete and update happens on one of the added segment. This issue is not seen without delete and update on 1 segment. Test queries: drop table if exists external_primitive; create table external_primitive (id int, name string, rank smallint, salary double, active boolean, dob date, doj timestamp, city string, dept string) stored as carbondata; insert into external_primitive select 1,"Pr",1,10,true,"1992-12-09","1992-10-07 22:00:20.0","chennai","CSE"; alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive','format'='carbon'); delete from external_primitive where id =2; update external_primitive set (name)=("RAMU") where name="CCC"; drop table if exists external_primitive; create table external_primitive (id int, name string, rank smallint, salary double, active boolean, dob date, doj timestamp, city string, dept string) stored as carbondata; alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive','format'='carbon'); alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon'); alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon'); alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive4','format'='carbon'); alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive5','format'='carbon'); alter table external_primitive compact 'minor'; !image-2021-06-08-16-54-52-412.png! Error traces: Error: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.AnalysisException: Compaction failed. Please check logs for more info. Exception in compaction Compaction Failure in Merger Rdd. at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:396) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$3(SparkExecuteStatementOperation.scala:281) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78) at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:46) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:281) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1761) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:295) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.sql.AnalysisException: Compaction failed. Please check logs for more info. Exception in compaction Compaction Failure in Merger Rdd. at org.apache.spark.sql.util.CarbonException$.analysisException(CarbonException.scala:23) at org.apache.spark.sql.execution.command.management.CarbonAlterTableCompactionCommand.$anonfun$processData$3(CarbonAlterTableCompactionCommand.scala:197) at org.apache.carbondata.events.package$.withEvents(package.scala:27) at org.apache.spark.sql.execution.command.management.CarbonAlterTableCompactionCommand.processData(CarbonAlterTableCompactionCommand.scala:185) at org.apache.spark.sql.execution.command.AtomicRunnableCommand.$anonfun$run$3(package.scala:162) at
[jira] [Closed] (CARBONDATA-4021) With Index server running, Upon executing count* we are getting the below error, after adding the parquet and ORC segment.
[ https://issues.apache.org/jira/browse/CARBONDATA-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran closed CARBONDATA-4021. - Resolution: Not A Problem > With Index server running, Upon executing count* we are getting the below > error, after adding the parquet and ORC segment. > --- > > Key: CARBONDATA-4021 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4021 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Prasanna Ravichandran >Priority: Major > > We are getting below issues while index server enable and index server > fallback disable is configured as true. With count* we are getting the below > error, after adding the parquet and ORC segment. > Queries and error: > > use rps; > +-+| > Result | > +-+ > +-+ > No rows selected (0.054 seconds) > > drop table if exists uniqdata; > +-+ > |Result| > +-+ > +-+ > No rows selected (0.229 seconds) > > CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version > > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > > bigint,decimal_column1 decimal(30,10), decimal_column2 > > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > > int) stored as carbondata; > +-+ > |Result| > +-+ > +-+ > No rows selected (0.756 seconds) > > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into > > table uniqdata > > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); > INFO : Execution ID: 95 > +-+| > Result | > +-+ > +-+ > No rows selected(2.789 seconds) > > use default; > +-+ > |Result| > +-+ > +-+ > No rows selected (0.052 seconds) > > drop table if exists uniqdata; > +-+ > |Result| > +-+ > +-+ > No rows selected (1.122 seconds) > > CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version > > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > > bigint,decimal_column1 decimal(30,10), decimal_column2 > > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > > int) stored as carbondata; > +-+ > | Result | > +-+ > +-+ > No rows selected (0.508 seconds) > > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into > > table uniqdata > > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); > INFO : Execution ID: 108 > +-+ > |Result| > +-+ > +-+ > No rows selected (1.316 seconds) > > drop table if exists uniqdata_parquet; > +-+ > |Result| > +-+ > +-+ > No rows selected (0.668 seconds) > > CREATE TABLE uniqdata_parquet (cust_id int,cust_name > > String,active_emui_version string, dob timestamp, doj timestamp, > > bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), > > decimal_column2 decimal(36,36),double_column1 double, double_column2 > > double,integer_column1 int) stored as parquet; > +-+ > |Result| > +-+ > +-+ > No rows selected (0.397 seconds) > > insert into uniqdata_parquet select * from uniqdata; > INFO : Execution ID: 116 > +-+ > |Result| > +-+ > +-+ > No rows selected (4.805 seconds) > > drop table if exists uniqdata_orc; > +-+ > |Result| > +-+ > +-+ > No rows selected (0.553 seconds) > > CREATE TABLE uniqdata_orc (cust_id int,cust_name String,active_emui_version > > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > > bigint,decimal_column1 decimal(30,10), decimal_column2 > > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > > int) using orc; > +-+ > |Result| > +-+ > +-+ > No rows selected (0.396 seconds) > > insert into uniqdata_orc select * from uniqdata; > INFO : Execution ID: 122 > +-+ > |Result| > +-+ > +-+ > No rows selected (3.403 seconds) > > use rps; > +-+ > |Result| > +-+ > +-+ > No rows selected (0.06 seconds) > > Alter table uniqdata add segment options > > ('path'='hdfs://hacluster/user/hive/warehouse/uniqdata_parquet','format'='parquet'); > INFO : Execution ID: 126 > +-+ > |Result| > +-+ > +-+ > No rows selected (1.511 seconds) > > Alter table uniqdata add segment options
[jira] [Comment Edited] (CARBONDATA-4029) After delete in the table which has Alter-added SDK segments, then the count(*) is 0.
[ https://issues.apache.org/jira/browse/CARBONDATA-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227384#comment-17227384 ] Prasanna Ravichandran edited comment on CARBONDATA-4029 at 11/6/20, 12:47 PM: -- Working fine and passed with the newly created SDK segments. Some old SDK segments with future timestamp will be considered invalid segment because of some other scenarios in delete/update. Old SDK files with future timestamp values, cannot be fixed. was (Author: prasanna ravichandran): Working fine and passed with the newly created SDK segments. Some old SDK segments with future timestamp will be considered as consider invalid segment because of some other scenarios in delete/update. Old SDK files with future timestamp values, cannot be fixed. > After delete in the table which has Alter-added SDK segments, then the > count(*) is 0. > - > > Key: CARBONDATA-4029 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4029 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: 3 node FI cluster >Reporter: Prasanna Ravichandran >Priority: Minor > Attachments: Primitive.rar > > > Do delete on a table which has alter added SDK segments. then the count* is > 0. Even count* will be 0 even any number of SDK segments are added after it. > Test queries: > drop table if exists external_primitive; > create table external_primitive (id int, name string, rank smallint, salary > double, active boolean, dob date, doj timestamp, city string, dept string) > stored as carbondata; > --before executing the below alter add segment-place the attached SDK files > in hdfs at /sdkfiles/primitive2 folder; > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');select > * from external_primitive; > delete from external_primitive where id =2;select * from external_primitive; > Console output: > /> drop table if exists external_primitive; > +-+ > | Result | > +-+ > +-+ > No rows selected (1.586 seconds) > /> create table external_primitive (id int, name string, rank smallint, > salary double, active boolean, dob date, doj timestamp, city string, dept > string) stored as carbondata; > +-+ > | Result | > +-+ > +-+ > No rows selected (0.774 seconds) > /> alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');select > * from external_primitive; > +-+ > | Result | > +-+ > +-+ > No rows selected (1.077 seconds) > INFO : Execution ID: 320 > +-+---+---+--+-+-++++ > | id | name | rank | salary | active | dob | doj | city | dept | > +-+---+---+--+-+-++++ > | 1 | AAA | 3 | 3444345.66 | true | 1979-12-09 | 2011-02-10 01:00:20.0 | Pune > | IT | > | 2 | BBB | 2 | 543124.66 | false | 1987-02-19 | 2017-01-01 12:00:20.0 | > Bangalore | DATA | > | 3 | CCC | 1 | 787878.888 | false | 1982-05-12 | 2015-12-01 02:20:20.0 | > Pune | DATA | > | 4 | DDD | 1 | 9.24 | true | 1981-04-09 | 2000-01-15 07:00:20.0 | Delhi > | MAINS | > | 5 | EEE | 3 | 545656.99 | true | 1987-12-09 | 2017-11-25 04:00:20.0 | Delhi > | IT | > | 6 | FFF | 2 | 768678.0 | false | 1987-12-20 | 2017-01-10 05:00:20.0 | > Bangalore | DATA | > | 7 | GGG | 3 | 765665.0 | true | 1983-06-12 | 2017-01-01 02:00:20.0 | Pune | > IT | > | 8 | HHH | 2 | 567567.66 | false | 1979-01-12 | 1995-01-01 12:00:20.0 | > Bangalore | DATA | > | 9 | III | 2 | 787878.767 | true | 1985-02-19 | 2005-08-15 01:00:20.0 | Pune > | DATA | > | 10 | JJJ | 3 | 887877.14 | true | 2000-05-19 | 2016-10-10 12:00:20.0 | > Bangalore | MAINS | > | 18 | | 3 | 7.86786786787E9 | true | 1980-10-05 | 1995-10-07 22:00:20.0 | > Bangalore | IT | > | 19 | | 2 | 5464545.33 | true | 1986-06-06 | 2008-08-15 01:00:20.0 | Delhi | > DATA | > | 20 | NULL | 3 | 7867867.34 | true | 2000-05-01 | 2014-01-18 12:00:20.0 | > Bangalore | MAINS | > +-+---+---+--+-+-++++ > 13 rows selected (2.458 seconds) > /> delete from external_primitive where id =2;select * from > external_primitive; > INFO : Execution ID: 322 > ++ > | Deleted Row Count | > ++ > | 1 | > ++ > 1 row selected (3.723 seconds) > +-+---+---+-+-+--+--+---+---+ > | id | name | rank | salary | active | dob | doj | city | dept | >
[jira] [Commented] (CARBONDATA-4029) After delete in the table which has Alter-added SDK segments, then the count(*) is 0.
[ https://issues.apache.org/jira/browse/CARBONDATA-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227384#comment-17227384 ] Prasanna Ravichandran commented on CARBONDATA-4029: --- Working fine and passed with the newly created SDK segments. Some old SDK segments with future timestamp will be considered as consider invalid segment because of some other scenarios in delete/update. Old SDK files with future timestamp values, cannot be fixed. > After delete in the table which has Alter-added SDK segments, then the > count(*) is 0. > - > > Key: CARBONDATA-4029 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4029 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: 3 node FI cluster >Reporter: Prasanna Ravichandran >Priority: Minor > Attachments: Primitive.rar > > > Do delete on a table which has alter added SDK segments. then the count* is > 0. Even count* will be 0 even any number of SDK segments are added after it. > Test queries: > drop table if exists external_primitive; > create table external_primitive (id int, name string, rank smallint, salary > double, active boolean, dob date, doj timestamp, city string, dept string) > stored as carbondata; > --before executing the below alter add segment-place the attached SDK files > in hdfs at /sdkfiles/primitive2 folder; > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');select > * from external_primitive; > delete from external_primitive where id =2;select * from external_primitive; > Console output: > /> drop table if exists external_primitive; > +-+ > | Result | > +-+ > +-+ > No rows selected (1.586 seconds) > /> create table external_primitive (id int, name string, rank smallint, > salary double, active boolean, dob date, doj timestamp, city string, dept > string) stored as carbondata; > +-+ > | Result | > +-+ > +-+ > No rows selected (0.774 seconds) > /> alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');select > * from external_primitive; > +-+ > | Result | > +-+ > +-+ > No rows selected (1.077 seconds) > INFO : Execution ID: 320 > +-+---+---+--+-+-++++ > | id | name | rank | salary | active | dob | doj | city | dept | > +-+---+---+--+-+-++++ > | 1 | AAA | 3 | 3444345.66 | true | 1979-12-09 | 2011-02-10 01:00:20.0 | Pune > | IT | > | 2 | BBB | 2 | 543124.66 | false | 1987-02-19 | 2017-01-01 12:00:20.0 | > Bangalore | DATA | > | 3 | CCC | 1 | 787878.888 | false | 1982-05-12 | 2015-12-01 02:20:20.0 | > Pune | DATA | > | 4 | DDD | 1 | 9.24 | true | 1981-04-09 | 2000-01-15 07:00:20.0 | Delhi > | MAINS | > | 5 | EEE | 3 | 545656.99 | true | 1987-12-09 | 2017-11-25 04:00:20.0 | Delhi > | IT | > | 6 | FFF | 2 | 768678.0 | false | 1987-12-20 | 2017-01-10 05:00:20.0 | > Bangalore | DATA | > | 7 | GGG | 3 | 765665.0 | true | 1983-06-12 | 2017-01-01 02:00:20.0 | Pune | > IT | > | 8 | HHH | 2 | 567567.66 | false | 1979-01-12 | 1995-01-01 12:00:20.0 | > Bangalore | DATA | > | 9 | III | 2 | 787878.767 | true | 1985-02-19 | 2005-08-15 01:00:20.0 | Pune > | DATA | > | 10 | JJJ | 3 | 887877.14 | true | 2000-05-19 | 2016-10-10 12:00:20.0 | > Bangalore | MAINS | > | 18 | | 3 | 7.86786786787E9 | true | 1980-10-05 | 1995-10-07 22:00:20.0 | > Bangalore | IT | > | 19 | | 2 | 5464545.33 | true | 1986-06-06 | 2008-08-15 01:00:20.0 | Delhi | > DATA | > | 20 | NULL | 3 | 7867867.34 | true | 2000-05-01 | 2014-01-18 12:00:20.0 | > Bangalore | MAINS | > +-+---+---+--+-+-++++ > 13 rows selected (2.458 seconds) > /> delete from external_primitive where id =2;select * from > external_primitive; > INFO : Execution ID: 322 > ++ > | Deleted Row Count | > ++ > | 1 | > ++ > 1 row selected (3.723 seconds) > +-+---+---+-+-+--+--+---+---+ > | id | name | rank | salary | active | dob | doj | city | dept | > +-+---+---+-+-+--+--+---+---+ > +-+---+---+-+-+--+--+---+---+ > No rows selected (1.531 seconds) > /> alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon');select > * from external_primitive; > +-+ > | Result | > +-+ >
[jira] [Updated] (CARBONDATA-4029) After delete in the table which has Alter-added SDK segments, then the count(*) is 0.
[ https://issues.apache.org/jira/browse/CARBONDATA-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-4029: -- Description: Do delete on a table which has alter added SDK segments. then the count* is 0. Even count* will be 0 even any number of SDK segments are added after it. Test queries: drop table if exists external_primitive; create table external_primitive (id int, name string, rank smallint, salary double, active boolean, dob date, doj timestamp, city string, dept string) stored as carbondata; --before executing the below alter add segment-place the attached SDK files in hdfs at /sdkfiles/primitive2 folder; alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');select * from external_primitive; delete from external_primitive where id =2;select * from external_primitive; Console output: /> drop table if exists external_primitive; +-+ | Result | +-+ +-+ No rows selected (1.586 seconds) /> create table external_primitive (id int, name string, rank smallint, salary double, active boolean, dob date, doj timestamp, city string, dept string) stored as carbondata; +-+ | Result | +-+ +-+ No rows selected (0.774 seconds) /> alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');select * from external_primitive; +-+ | Result | +-+ +-+ No rows selected (1.077 seconds) INFO : Execution ID: 320 +-+---+---+--+-+-++++ | id | name | rank | salary | active | dob | doj | city | dept | +-+---+---+--+-+-++++ | 1 | AAA | 3 | 3444345.66 | true | 1979-12-09 | 2011-02-10 01:00:20.0 | Pune | IT | | 2 | BBB | 2 | 543124.66 | false | 1987-02-19 | 2017-01-01 12:00:20.0 | Bangalore | DATA | | 3 | CCC | 1 | 787878.888 | false | 1982-05-12 | 2015-12-01 02:20:20.0 | Pune | DATA | | 4 | DDD | 1 | 9.24 | true | 1981-04-09 | 2000-01-15 07:00:20.0 | Delhi | MAINS | | 5 | EEE | 3 | 545656.99 | true | 1987-12-09 | 2017-11-25 04:00:20.0 | Delhi | IT | | 6 | FFF | 2 | 768678.0 | false | 1987-12-20 | 2017-01-10 05:00:20.0 | Bangalore | DATA | | 7 | GGG | 3 | 765665.0 | true | 1983-06-12 | 2017-01-01 02:00:20.0 | Pune | IT | | 8 | HHH | 2 | 567567.66 | false | 1979-01-12 | 1995-01-01 12:00:20.0 | Bangalore | DATA | | 9 | III | 2 | 787878.767 | true | 1985-02-19 | 2005-08-15 01:00:20.0 | Pune | DATA | | 10 | JJJ | 3 | 887877.14 | true | 2000-05-19 | 2016-10-10 12:00:20.0 | Bangalore | MAINS | | 18 | | 3 | 7.86786786787E9 | true | 1980-10-05 | 1995-10-07 22:00:20.0 | Bangalore | IT | | 19 | | 2 | 5464545.33 | true | 1986-06-06 | 2008-08-15 01:00:20.0 | Delhi | DATA | | 20 | NULL | 3 | 7867867.34 | true | 2000-05-01 | 2014-01-18 12:00:20.0 | Bangalore | MAINS | +-+---+---+--+-+-++++ 13 rows selected (2.458 seconds) /> delete from external_primitive where id =2;select * from external_primitive; INFO : Execution ID: 322 ++ | Deleted Row Count | ++ | 1 | ++ 1 row selected (3.723 seconds) +-+---+---+-+-+--+--+---+---+ | id | name | rank | salary | active | dob | doj | city | dept | +-+---+---+-+-+--+--+---+---+ +-+---+---+-+-+--+--+---+---+ No rows selected (1.531 seconds) /> alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon');select * from external_primitive; +-+ | Result | +-+ +-+ No rows selected (0.766 seconds) +-+---+---+-+-+--+--+---+---+ | id | name | rank | salary | active | dob | doj | city | dept | +-+---+---+-+-+--+--+---+---+ +-+---+---+-+-+--+--+---+---+ No rows selected (1.439 seconds) /> select count(*) from external_primitive; INFO : Execution ID: 335 +---+ | count(1) | +---+ | 0 | +---+ 1 row selected (1.278 seconds) /> > After delete in the table which has Alter-added SDK segments, then the > count(*) is 0. > - > > Key: CARBONDATA-4029 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4029 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: 3 node FI cluster >Reporter: Prasanna Ravichandran >
[jira] [Updated] (CARBONDATA-4029) After delete in the table which has Alter-added SDK segments, then the count(*) is 0.
[ https://issues.apache.org/jira/browse/CARBONDATA-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-4029: -- Description: (was: We are getting Number format exception while querying on the date columns. Attached the SDK files also. Test queries: --SDK compaction; drop table if exists external_primitive; create table external_primitive (id int, name string, rank smallint, salary double, active boolean, dob date, doj timestamp, city string, dept string) stored as carbondata; alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive','format'='carbon'); alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon'); alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon'); alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive4','format'='carbon'); alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive5','format'='carbon'); alter table external_primitive compact 'minor'; --working fine pass; select count(*) from external_primitive;--working fine pass; show segments for table external_primitive; select * from external_primitive limit 13; --working fine pass; select * from external_primitive limit 14; --failed getting number format exception; select min(dob) from external_primitive; --failed getting number format exception; select max(dob) from external_primitive; --working; select dob from external_primitive; --failed getting number format exception; Console: *0: /> show segments for table external_primitive;* +--++--+--+++-+--+ | ID | Status | Load Start Time | Load Time Taken | Partition | Data Size | Index Size | File Format | +--++--+--+++-+--+ | 4 | Success | 2020-10-13 11:52:04.012 | 0.511S | {} | 1.88KB | 655.0B | columnar_v3 | | 3 | Compacted | 2020-10-13 11:52:00.587 | 0.828S | {} | 1.88KB | 655.0B | columnar_v3 | | 2 | Compacted | 2020-10-13 11:51:57.767 | 0.775S | {} | 1.88KB | 655.0B | columnar_v3 | | 1 | Compacted | 2020-10-13 11:51:54.678 | 1.024S | {} | 1.88KB | 655.0B | columnar_v3 | | 0.1 | Success | 2020-10-13 11:52:05.986 | 5.785S | {} | 9.62KB | 5.01KB | columnar_v3 | | 0 | Compacted | 2020-10-13 11:51:51.072 | 1.125S | {} | 8.55KB | 4.25KB | columnar_v3 | +--++--+--+++-+--+ 6 rows selected (0.45 seconds) *0: /> select * from external_primitive limit 13;* --working fine pass; INFO : Execution ID: 95 +-+---+---+--+-+-++++ | id | name | rank | salary | active | dob | doj | city | dept | +-+---+---+--+-+-++++ | 1 | AAA | 3 | 3444345.66 | true | 1979-12-09 | 2011-02-09 22:30:20.0 | Pune | IT | | 2 | BBB | 2 | 543124.66 | false | 1987-02-19 | 2017-01-01 09:30:20.0 | Bangalore | DATA | | 3 | CCC | 1 | 787878.888 | false | 1982-05-12 | 2015-11-30 23:50:20.0 | Pune | DATA | | 4 | DDD | 1 | 9.24 | true | 1981-04-09 | 2000-01-15 04:30:20.0 | Delhi | MAINS | | 5 | EEE | 3 | 545656.99 | true | 1987-12-09 | 2017-11-25 01:30:20.0 | Delhi | IT | | 6 | FFF | 2 | 768678.0 | false | 1987-12-20 | 2017-01-10 02:30:20.0 | Bangalore | DATA | | 7 | GGG | 3 | 765665.0 | true | 1983-06-12 | 2016-12-31 23:30:20.0 | Pune | IT | | 8 | HHH | 2 | 567567.66 | false | 1979-01-12 | 1995-01-01 09:30:20.0 | Bangalore | DATA | | 9 | III | 2 | 787878.767 | true | 1985-02-19 | 2005-08-14 22:30:20.0 | Pune | DATA | | 10 | JJJ | 3 | 887877.14 | true | 2000-05-19 | 2016-10-10 09:30:20.0 | Bangalore | MAINS | | 18 | | 3 | 7.86786786787E9 | true | 1980-10-05 | 1995-10-07 19:30:20.0 | Bangalore | IT | | 19 | | 2 | 5464545.33 | true | 1986-06-06 | 2008-08-14 22:30:20.0 | Delhi | DATA | | 20 | NULL | 3 | 7867867.34 | true | 2000-05-01 | 2014-01-18 09:30:20.0 | Bangalore | MAINS | +-+---+---+--+-+-++++ 13 rows selected (1.775 seconds) *0: /> select * from external_primitive limit 14;* --failed getting number format exception; INFO : Execution ID: 97 *java.lang.NumberFormatException: For input string: "776"* at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:569) at java.lang.Integer.parseInt(Integer.java:615) at java.sql.Date.valueOf(Date.java:133) at
[jira] [Updated] (CARBONDATA-4029) After delete in the table which has Alter-added SDK segments, then the count(*) is 0.
[ https://issues.apache.org/jira/browse/CARBONDATA-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-4029: -- Summary: After delete in the table which has Alter-added SDK segments, then the count(*) is 0. (was: Getting Number format exception while querying on date columns in SDK carbon table.) > After delete in the table which has Alter-added SDK segments, then the > count(*) is 0. > - > > Key: CARBONDATA-4029 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4029 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: 3 node FI cluster >Reporter: Prasanna Ravichandran >Priority: Minor > Attachments: Primitive.rar > > > We are getting Number format exception while querying on the date columns. > Attached the SDK files also. > Test queries: > --SDK compaction; > drop table if exists external_primitive; > create table external_primitive (id int, name string, rank smallint, salary > double, active boolean, dob date, doj timestamp, city string, dept string) > stored as carbondata; > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive4','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive5','format'='carbon'); > > alter table external_primitive compact 'minor'; --working fine pass; > select count(*) from external_primitive;--working fine pass; > show segments for table external_primitive; > select * from external_primitive limit 13; --working fine pass; > select * from external_primitive limit 14; --failed getting number format > exception; > select min(dob) from external_primitive; --failed getting number format > exception; > select max(dob) from external_primitive; --working; > select dob from external_primitive; --failed getting number format exception; > Console: > *0: /> show segments for table external_primitive;* > +--++--+--+++-+--+ > | ID | Status | Load Start Time | Load Time Taken | Partition | Data Size | > Index Size | File Format | > +--++--+--+++-+--+ > | 4 | Success | 2020-10-13 11:52:04.012 | 0.511S | {} | 1.88KB | 655.0B | > columnar_v3 | > | 3 | Compacted | 2020-10-13 11:52:00.587 | 0.828S | {} | 1.88KB | 655.0B | > columnar_v3 | > | 2 | Compacted | 2020-10-13 11:51:57.767 | 0.775S | {} | 1.88KB | 655.0B | > columnar_v3 | > | 1 | Compacted | 2020-10-13 11:51:54.678 | 1.024S | {} | 1.88KB | 655.0B | > columnar_v3 | > | 0.1 | Success | 2020-10-13 11:52:05.986 | 5.785S | {} | 9.62KB | 5.01KB | > columnar_v3 | > | 0 | Compacted | 2020-10-13 11:51:51.072 | 1.125S | {} | 8.55KB | 4.25KB | > columnar_v3 | > +--++--+--+++-+--+ > 6 rows selected (0.45 seconds) > *0: /> select * from external_primitive limit 13;* --working fine pass; > INFO : Execution ID: 95 > +-+---+---+--+-+-++++ > | id | name | rank | salary | active | dob | doj | city | dept | > +-+---+---+--+-+-++++ > | 1 | AAA | 3 | 3444345.66 | true | 1979-12-09 | 2011-02-09 22:30:20.0 | Pune > | IT | > | 2 | BBB | 2 | 543124.66 | false | 1987-02-19 | 2017-01-01 09:30:20.0 | > Bangalore | DATA | > | 3 | CCC | 1 | 787878.888 | false | 1982-05-12 | 2015-11-30 23:50:20.0 | > Pune | DATA | > | 4 | DDD | 1 | 9.24 | true | 1981-04-09 | 2000-01-15 04:30:20.0 | Delhi > | MAINS | > | 5 | EEE | 3 | 545656.99 | true | 1987-12-09 | 2017-11-25 01:30:20.0 | Delhi > | IT | > | 6 | FFF | 2 | 768678.0 | false | 1987-12-20 | 2017-01-10 02:30:20.0 | > Bangalore | DATA | > | 7 | GGG | 3 | 765665.0 | true | 1983-06-12 | 2016-12-31 23:30:20.0 | Pune | > IT | > | 8 | HHH | 2 | 567567.66 | false | 1979-01-12 | 1995-01-01 09:30:20.0 | > Bangalore | DATA | > | 9 | III | 2 | 787878.767 | true | 1985-02-19 | 2005-08-14 22:30:20.0 | Pune > | DATA | > | 10 | JJJ | 3 | 887877.14 | true | 2000-05-19 | 2016-10-10 09:30:20.0 | > Bangalore | MAINS | >
[jira] [Reopened] (CARBONDATA-4029) Getting Number format exception while querying on date columns in SDK carbon table.
[ https://issues.apache.org/jira/browse/CARBONDATA-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran reopened CARBONDATA-4029: --- > Getting Number format exception while querying on date columns in SDK carbon > table. > --- > > Key: CARBONDATA-4029 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4029 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: 3 node FI cluster >Reporter: Prasanna Ravichandran >Priority: Minor > Attachments: Primitive.rar > > > We are getting Number format exception while querying on the date columns. > Attached the SDK files also. > Test queries: > --SDK compaction; > drop table if exists external_primitive; > create table external_primitive (id int, name string, rank smallint, salary > double, active boolean, dob date, doj timestamp, city string, dept string) > stored as carbondata; > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive4','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive5','format'='carbon'); > > alter table external_primitive compact 'minor'; --working fine pass; > select count(*) from external_primitive;--working fine pass; > show segments for table external_primitive; > select * from external_primitive limit 13; --working fine pass; > select * from external_primitive limit 14; --failed getting number format > exception; > select min(dob) from external_primitive; --failed getting number format > exception; > select max(dob) from external_primitive; --working; > select dob from external_primitive; --failed getting number format exception; > Console: > *0: /> show segments for table external_primitive;* > +--++--+--+++-+--+ > | ID | Status | Load Start Time | Load Time Taken | Partition | Data Size | > Index Size | File Format | > +--++--+--+++-+--+ > | 4 | Success | 2020-10-13 11:52:04.012 | 0.511S | {} | 1.88KB | 655.0B | > columnar_v3 | > | 3 | Compacted | 2020-10-13 11:52:00.587 | 0.828S | {} | 1.88KB | 655.0B | > columnar_v3 | > | 2 | Compacted | 2020-10-13 11:51:57.767 | 0.775S | {} | 1.88KB | 655.0B | > columnar_v3 | > | 1 | Compacted | 2020-10-13 11:51:54.678 | 1.024S | {} | 1.88KB | 655.0B | > columnar_v3 | > | 0.1 | Success | 2020-10-13 11:52:05.986 | 5.785S | {} | 9.62KB | 5.01KB | > columnar_v3 | > | 0 | Compacted | 2020-10-13 11:51:51.072 | 1.125S | {} | 8.55KB | 4.25KB | > columnar_v3 | > +--++--+--+++-+--+ > 6 rows selected (0.45 seconds) > *0: /> select * from external_primitive limit 13;* --working fine pass; > INFO : Execution ID: 95 > +-+---+---+--+-+-++++ > | id | name | rank | salary | active | dob | doj | city | dept | > +-+---+---+--+-+-++++ > | 1 | AAA | 3 | 3444345.66 | true | 1979-12-09 | 2011-02-09 22:30:20.0 | Pune > | IT | > | 2 | BBB | 2 | 543124.66 | false | 1987-02-19 | 2017-01-01 09:30:20.0 | > Bangalore | DATA | > | 3 | CCC | 1 | 787878.888 | false | 1982-05-12 | 2015-11-30 23:50:20.0 | > Pune | DATA | > | 4 | DDD | 1 | 9.24 | true | 1981-04-09 | 2000-01-15 04:30:20.0 | Delhi > | MAINS | > | 5 | EEE | 3 | 545656.99 | true | 1987-12-09 | 2017-11-25 01:30:20.0 | Delhi > | IT | > | 6 | FFF | 2 | 768678.0 | false | 1987-12-20 | 2017-01-10 02:30:20.0 | > Bangalore | DATA | > | 7 | GGG | 3 | 765665.0 | true | 1983-06-12 | 2016-12-31 23:30:20.0 | Pune | > IT | > | 8 | HHH | 2 | 567567.66 | false | 1979-01-12 | 1995-01-01 09:30:20.0 | > Bangalore | DATA | > | 9 | III | 2 | 787878.767 | true | 1985-02-19 | 2005-08-14 22:30:20.0 | Pune > | DATA | > | 10 | JJJ | 3 | 887877.14 | true | 2000-05-19 | 2016-10-10 09:30:20.0 | > Bangalore | MAINS | > | 18 | | 3 | 7.86786786787E9 | true | 1980-10-05 | 1995-10-07 19:30:20.0 | > Bangalore | IT | > | 19 | | 2 | 5464545.33 | true | 1986-06-06 | 2008-08-14 22:30:20.0 | Delhi | > DATA | > | 20 |
[jira] [Closed] (CARBONDATA-4029) Getting Number format exception while querying on date columns in SDK carbon table.
[ https://issues.apache.org/jira/browse/CARBONDATA-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran closed CARBONDATA-4029. - Resolution: Won't Fix > Getting Number format exception while querying on date columns in SDK carbon > table. > --- > > Key: CARBONDATA-4029 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4029 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: 3 node FI cluster >Reporter: Prasanna Ravichandran >Priority: Minor > Attachments: Primitive.rar > > > We are getting Number format exception while querying on the date columns. > Attached the SDK files also. > Test queries: > --SDK compaction; > drop table if exists external_primitive; > create table external_primitive (id int, name string, rank smallint, salary > double, active boolean, dob date, doj timestamp, city string, dept string) > stored as carbondata; > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive4','format'='carbon'); > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive5','format'='carbon'); > > alter table external_primitive compact 'minor'; --working fine pass; > select count(*) from external_primitive;--working fine pass; > show segments for table external_primitive; > select * from external_primitive limit 13; --working fine pass; > select * from external_primitive limit 14; --failed getting number format > exception; > select min(dob) from external_primitive; --failed getting number format > exception; > select max(dob) from external_primitive; --working; > select dob from external_primitive; --failed getting number format exception; > Console: > *0: /> show segments for table external_primitive;* > +--++--+--+++-+--+ > | ID | Status | Load Start Time | Load Time Taken | Partition | Data Size | > Index Size | File Format | > +--++--+--+++-+--+ > | 4 | Success | 2020-10-13 11:52:04.012 | 0.511S | {} | 1.88KB | 655.0B | > columnar_v3 | > | 3 | Compacted | 2020-10-13 11:52:00.587 | 0.828S | {} | 1.88KB | 655.0B | > columnar_v3 | > | 2 | Compacted | 2020-10-13 11:51:57.767 | 0.775S | {} | 1.88KB | 655.0B | > columnar_v3 | > | 1 | Compacted | 2020-10-13 11:51:54.678 | 1.024S | {} | 1.88KB | 655.0B | > columnar_v3 | > | 0.1 | Success | 2020-10-13 11:52:05.986 | 5.785S | {} | 9.62KB | 5.01KB | > columnar_v3 | > | 0 | Compacted | 2020-10-13 11:51:51.072 | 1.125S | {} | 8.55KB | 4.25KB | > columnar_v3 | > +--++--+--+++-+--+ > 6 rows selected (0.45 seconds) > *0: /> select * from external_primitive limit 13;* --working fine pass; > INFO : Execution ID: 95 > +-+---+---+--+-+-++++ > | id | name | rank | salary | active | dob | doj | city | dept | > +-+---+---+--+-+-++++ > | 1 | AAA | 3 | 3444345.66 | true | 1979-12-09 | 2011-02-09 22:30:20.0 | Pune > | IT | > | 2 | BBB | 2 | 543124.66 | false | 1987-02-19 | 2017-01-01 09:30:20.0 | > Bangalore | DATA | > | 3 | CCC | 1 | 787878.888 | false | 1982-05-12 | 2015-11-30 23:50:20.0 | > Pune | DATA | > | 4 | DDD | 1 | 9.24 | true | 1981-04-09 | 2000-01-15 04:30:20.0 | Delhi > | MAINS | > | 5 | EEE | 3 | 545656.99 | true | 1987-12-09 | 2017-11-25 01:30:20.0 | Delhi > | IT | > | 6 | FFF | 2 | 768678.0 | false | 1987-12-20 | 2017-01-10 02:30:20.0 | > Bangalore | DATA | > | 7 | GGG | 3 | 765665.0 | true | 1983-06-12 | 2016-12-31 23:30:20.0 | Pune | > IT | > | 8 | HHH | 2 | 567567.66 | false | 1979-01-12 | 1995-01-01 09:30:20.0 | > Bangalore | DATA | > | 9 | III | 2 | 787878.767 | true | 1985-02-19 | 2005-08-14 22:30:20.0 | Pune > | DATA | > | 10 | JJJ | 3 | 887877.14 | true | 2000-05-19 | 2016-10-10 09:30:20.0 | > Bangalore | MAINS | > | 18 | | 3 | 7.86786786787E9 | true | 1980-10-05 | 1995-10-07 19:30:20.0 | > Bangalore | IT | > | 19 | | 2 | 5464545.33 | true | 1986-06-06 | 2008-08-14 22:30:20.0 | Delhi
[jira] [Closed] (CARBONDATA-3937) Insert into select from another carbon /parquet table is not working on Hive Beeline on a newly create Hive write format - carbon table. We are getting “Database is n
[ https://issues.apache.org/jira/browse/CARBONDATA-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran closed CARBONDATA-3937. - > Insert into select from another carbon /parquet table is not working on Hive > Beeline on a newly create Hive write format - carbon table. We are getting > “Database is not set" error. > > > Key: CARBONDATA-3937 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3937 > Project: CarbonData > Issue Type: Bug > Components: hive-integration >Affects Versions: 2.0.0 >Reporter: Prasanna Ravichandran >Priority: Major > > Insert into select from another carbon or parquet table to a carbon table is > not working on Hive Beeline on a newly create Hive write format carbon table. > We are getting “Database is not set” error. > > Test queries: > drop table if exists hive_carbon; > create table hive_carbon(id int, name string, scale decimal, country string, > salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler'; > insert into hive_carbon select 1,"Ram","2.3","India",3500; > insert into hive_carbon select 2,"Raju","2.4","Russia",3600; > insert into hive_carbon select 3,"Raghu","2.5","China",3700; > insert into hive_carbon select 4,"Ravi","2.6","Australia",3800; > > drop table if exists hive_carbon2; > create table hive_carbon2(id int, name string, scale decimal, country string, > salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler'; > insert into hive_carbon2 select * from hive_carbon; > select * from hive_carbon; > select * from hive_carbon2; > > --execute below queries in spark-beeline; > create table hive_table(id int, name string, scale decimal, country string, > salary double); > create table parquet_table(id int, name string, scale decimal, country > string, salary double) stored as parquet; > insert into hive_table select 1,"Ram","2.3","India",3500; > select * from hive_table; > insert into parquet_table select 1,"Ram","2.3","India",3500; > select * from parquet_table; > --execute the below query in hive beeline; > insert into hive_carbon select * from parquet_table; > Attached the logs for your reference. But the insert into select from the > parquet and hive table into carbon table is working fine. > > Only insert into select from hive table to carbon table is only working. > Error details in MR job which run through hive query: > Error: java.io.IOException: java.io.IOException: Database name is not set. at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:414) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:843) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:175) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at > org.apache.hadoop.mapred.YarnChild$1.run(YarnChild.java:175) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1737) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) Caused by: > java.io.IOException: Database name is not set. at > org.apache.carbondata.hadoop.api.CarbonInputFormat.getDatabaseName(CarbonInputFormat.java:841) > at > org.apache.carbondata.hive.MapredCarbonInputFormat.getCarbonTable(MapredCarbonInputFormat.java:80) > at > org.apache.carbondata.hive.MapredCarbonInputFormat.getQueryModel(MapredCarbonInputFormat.java:215) > at > org.apache.carbondata.hive.MapredCarbonInputFormat.getRecordReader(MapredCarbonInputFormat.java:205) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:411) > ... 9 more -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (CARBONDATA-3819) Fileformat column details is not present in the show segments DDL for heterogenous segments table.
[ https://issues.apache.org/jira/browse/CARBONDATA-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran closed CARBONDATA-3819. - Fixed and verified. > Fileformat column details is not present in the show segments DDL for > heterogenous segments table. > -- > > Key: CARBONDATA-3819 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3819 > Project: CarbonData > Issue Type: Bug > Environment: Opensource ANT cluster >Reporter: Prasanna Ravichandran >Priority: Minor > Attachments: fileformat_notworking_actualresult.PNG, > fileformat_working_expected.PNG > > > Fileformat column details is not present in the show segments DDL for > heterogenous segments table. > Test steps: > # Create a heterogenous table with added parquet and carbon segments. > # DO show segments. > Expected results: > It should show "FileFormat" column details in show segments DDL. > Actual result: > It is not showing the File format column details in show segments DDL. > See the attached screenshots for more details. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-3819) Fileformat column details is not present in the show segments DDL for heterogenous segments table.
[ https://issues.apache.org/jira/browse/CARBONDATA-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran resolved CARBONDATA-3819. --- Resolution: Fixed This issue is fixed in the latest Carbon jars - 2.0.0. > Fileformat column details is not present in the show segments DDL for > heterogenous segments table. > -- > > Key: CARBONDATA-3819 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3819 > Project: CarbonData > Issue Type: Bug > Environment: Opensource ANT cluster >Reporter: Prasanna Ravichandran >Priority: Minor > Attachments: fileformat_notworking_actualresult.PNG, > fileformat_working_expected.PNG > > > Fileformat column details is not present in the show segments DDL for > heterogenous segments table. > Test steps: > # Create a heterogenous table with added parquet and carbon segments. > # DO show segments. > Expected results: > It should show "FileFormat" column details in show segments DDL. > Actual result: > It is not showing the File format column details in show segments DDL. > See the attached screenshots for more details. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (CARBONDATA-4012) Documentations issues.
[ https://issues.apache.org/jira/browse/CARBONDATA-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran closed CARBONDATA-4012. - Complex features details are added to the Opensource document and verified. > Documentations issues. > -- > > Key: CARBONDATA-4012 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4012 > Project: CarbonData > Issue Type: Bug >Reporter: Prasanna Ravichandran >Priority: Minor > Fix For: 2.1.0 > > > Support Array and Struct of all primitive type reading on presto from Spark > Carbon tables. This feature details have to be added in the below opensource > link: > [https://github.com/apache/carbondata/blob/master/docs/prestosql-guide.md] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (CARBONDATA-4024) Select queries with filter and aggregate queries are not working in Hive write - carbon table.
[ https://issues.apache.org/jira/browse/CARBONDATA-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran closed CARBONDATA-4024. - Resolution: Duplicate > Select queries with filter and aggregate queries are not working in Hive > write - carbon table. > --- > > Key: CARBONDATA-4024 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4024 > Project: CarbonData > Issue Type: Bug > Components: hive-integration >Affects Versions: 2.0.0 >Reporter: Prasanna Ravichandran >Priority: Major > > Select queries with filter and aggregate queries are not working in Hive > write - carbon table. > Hive - console: > 0: /> use t2; > INFO : State: Compiling. > INFO : Compiling > command(queryId=omm_20201008191831_ac10f1ae-8d39-4185-b25a-d690134a94be): use > t2; > Current sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4 > INFO : hive.compile.auto.avoid.cbo=true > INFO : Concurrency mode is disabled, not creating a lock manager > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) > INFO : Completed compiling > command(queryId=omm_20201008191831_ac10f1ae-8d39-4185-b25a-d690134a94be); > Time taken: 0.122 seconds > INFO : Concurrency mode is disabled, not creating a lock manager > INFO : State: Executing. > INFO : Executing > command(queryId=omm_20201008191831_ac10f1ae-8d39-4185-b25a-d690134a94be): use > t2; Current sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4 > INFO : Starting task [Stage-0:DDL] in serial mode > INFO : Completed executing > command(queryId=omm_20201008191831_ac10f1ae-8d39-4185-b25a-d690134a94be); > Time taken: 0.019 seconds > INFO : OK > INFO : Concurrency mode is disabled, not creating a lock manager > No rows affected (0.207 seconds) > 0: /> show tables; > INFO : State: Compiling. > INFO : Compiling > command(queryId=omm_20201008191835_5e1e9469-0054-446f-af82-ec3294ec77b1): > show tables; > Current sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4 > INFO : hive.compile.auto.avoid.cbo=true > INFO : Concurrency mode is disabled, not creating a lock manager > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, > type:string, comment:from deserializer)], properties:null) > INFO : Completed compiling > command(queryId=omm_20201008191835_5e1e9469-0054-446f-af82-ec3294ec77b1); > Time taken: 0.015 seconds > INFO : Concurrency mode is disabled, not creating a lock manager > INFO : State: Executing. > INFO : Executing > command(queryId=omm_20201008191835_5e1e9469-0054-446f-af82-ec3294ec77b1): > show tables; Current sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4 > INFO : Starting task [Stage-0:DDL] in serial mode > INFO : Completed executing > command(queryId=omm_20201008191835_5e1e9469-0054-446f-af82-ec3294ec77b1); > Time taken: 0.016 seconds > INFO : OK > INFO : Concurrency mode is disabled, not creating a lock manager > ++ > | tab_name | > ++ > | hive_carbon | > | hive_table | > | parquet_table | > ++ > 3 rows selected (0.114 seconds) > 0: /> select * from hive_carbon; > INFO : State: Compiling. > INFO : Compiling > command(queryId=omm_20201008191842_9378bab9-181c-455e-aa6d-9b4f787ce6da): > select * from hive_carbon; > Current sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4 > INFO : hive.compile.auto.avoid.cbo=true > INFO : Concurrency mode is disabled, not creating a lock manager > INFO : Current sql is not contains insert syntax, not need record dest table > flag > INFO : Semantic Analysis Completed (retrial = false) > INFO : Returning Hive schema: > Schema(fieldSchemas:[FieldSchema(name:hive_carbon.id, type:int, > comment:null), FieldSchema(name:hive_carbon.name, type:string, comment:null), > FieldSchema(name:hive_carbon.scale, type:decimal(10,0), comment:null), > FieldSchema(name:hive_carbon.country, type:string, comment:null), > FieldSchema(name:hive_carbon.salary, type:double, comment:null)], > properties:null) > INFO : Completed compiling > command(queryId=omm_20201008191842_9378bab9-181c-455e-aa6d-9b4f787ce6da); > Time taken: 0.511 seconds > INFO : Concurrency mode is disabled, not creating a lock manager > INFO : State: Executing. > INFO : Executing > command(queryId=omm_20201008191842_9378bab9-181c-455e-aa6d-9b4f787ce6da): > select * from hive_carbon; Current > sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4 > INFO : Completed executing > command(queryId=omm_20201008191842_9378bab9-181c-455e-aa6d-9b4f787ce6da); > Time taken: 0.001 seconds > INFO : OK > INFO : Concurrency mode is disabled, not creating a lock manager >
[jira] [Updated] (CARBONDATA-3938) In Hive read table, we are unable to read a projection column or read a full scan - select * query. Even the aggregate queries are not working.
[ https://issues.apache.org/jira/browse/CARBONDATA-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-3938: -- Description: In Hive read table, we are unable to read a projection column or full scan query. But the aggregate queries are working fine. Test query: --spark beeline; drop table if exists uniqdata; drop table if exists uniqdata1; CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) stored as carbondata ; LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',', 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); CREATE TABLE IF NOT EXISTS uniqdata1 (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) ROW FORMAT SERDE 'org.apache.carbondata.hive.CarbonHiveSerDe' WITH SERDEPROPERTIES ('mapreduce.input.carboninputformat.databaseName'='default','mapreduce.input.carboninputformat.tableName'='uniqdata') STORED AS INPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonInputFormat' OUTPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonOutputFormat' LOCATION 'hdfs://hacluster/user/hive/warehouse/uniqdata'; select count(*) from uniqdata1; --Hive Beeline; select count(*) from uniqdata1; --not working, returning 0 rows, eventhough 2000 rows are there;--Issue 1 on Hive read format table; select * from uniqdata1; --Return no rows;--Issue 2 - a) full scan on Hive read format table; select cust_id from uniqdata1 limit 5;--Return no rows;–Issue 2-b select query with projection, not working, returning now rows; Attached the logs for your reference. With the Hive write table the aggregate& filter queries are not working but select * full scan queries are working. All 3 Issues (Full scan - select *, filter queries and aggregate queries) is not working in Hive read format table. This issue also exists when a normal carbon table(created through stored as carbondata) is created in Spark and data is read through select query from Hive beeline.) was: In Hive read table, we are unable to read a projection column or full scan query. But the aggregate queries are working fine. Test query: --spark beeline; drop table if exists uniqdata; drop table if exists uniqdata1; CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) stored as carbondata ; LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',', 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); CREATE TABLE IF NOT EXISTS uniqdata1 (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) ROW FORMAT SERDE 'org.apache.carbondata.hive.CarbonHiveSerDe' WITH SERDEPROPERTIES ('mapreduce.input.carboninputformat.databaseName'='default','mapreduce.input.carboninputformat.tableName'='uniqdata') STORED AS INPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonInputFormat' OUTPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonOutputFormat' LOCATION 'hdfs://hacluster/user/hive/warehouse/uniqdata'; select count(*) from uniqdata1; --Hive Beeline; select count(*) from uniqdata1; --not working, returning 0 rows, eventhough 2000 rows are there;--Issue 1 on Hive read format table; select * from uniqdata1; --Return no rows;--Issue 2 - a) full scan on Hive read format table; select cust_id from uniqdata1 limit 5;--Return no rows;–Issue 2-b select query with projection, not working, returning now rows; Attached the logs for your reference. With the Hive write table this issue is not seen. Issue is only seen in Hive read format table. This issue also exists when a normal carbon table is created in Spark and read through Hive beeline. > In Hive read table, we are unable to read a projection
[jira] [Commented] (CARBONDATA-3807) Filter queries and projection queries with bloom columns are not hitting the bloom datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213673#comment-17213673 ] Prasanna Ravichandran commented on CARBONDATA-3807: --- Model plan with bloom details: (Could not attach the screenshot) == CarbonData Profiler == Table Scan on uniqdata - total: 2 blocks, 2 blocklets - filter: (cust_name <> null and cust_name = CUST_NAME_0) - pruned by Main Index - skipped: 0 blocks, 0 blocklets *- pruned by CG Index* *- name: datamapuniq_b1* *- provider: bloomfilter* - skipped: 0 blocks, 0 blocklets == Physical Plan == AdaptiveSparkPlan(isFinalPlan=false) +- HashAggregate(keys=[], functions=[count(1)]) +- Exchange SinglePartition, true, [id=#129] +- HashAggregate(keys=[], functions=[partial_count(1)]) +- Project +- Scan carbondata default.uniqdata[] PushedFilters: [IsNotNull(cust_name), EqualTo(cust_name,CUST_NAME_0)], ReadSchema: struct > Filter queries and projection queries with bloom columns are not hitting the > bloom datamap. > --- > > Key: CARBONDATA-3807 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3807 > Project: CarbonData > Issue Type: Bug > Environment: Ant cluster - opensource >Reporter: Prasanna Ravichandran >Priority: Major > Fix For: 2.0.0 > > Attachments: bloom-filtercolumn-plan.png, bloom-show index.png > > > Filter queries and projection queries with bloom columns are not hitting the > bloom datamap. > Bloom datamap is unused as per plan, even though created. > Test queries: > drop table if exists uniqdata; > CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > bigint,decimal_column1 decimal(30,10), decimal_column2 > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > int) stored as carbondata; > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into > table uniqdata > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); > create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' > PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1'); > show indexes on uniqdata; > explain select count(*) from uniqdata where cust_name="CUST_NAME_0"; > --not hitting; > explain select cust_name from uniqdata; --not hitting; > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (CARBONDATA-3807) Filter queries and projection queries with bloom columns are not hitting the bloom datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran closed CARBONDATA-3807. - Fix Version/s: 2.0.0 Resolution: Not A Bug After adding the enable.query.statistics and then in the plan verification, we could see the Bloom filter related details in the explain query. This will be seen in plan, only after the create bloom index + load happens. With only create bloom index, it is not happening in plan. > Filter queries and projection queries with bloom columns are not hitting the > bloom datamap. > --- > > Key: CARBONDATA-3807 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3807 > Project: CarbonData > Issue Type: Bug > Environment: Ant cluster - opensource >Reporter: Prasanna Ravichandran >Priority: Major > Fix For: 2.0.0 > > Attachments: bloom-filtercolumn-plan.png, bloom-show index.png > > > Filter queries and projection queries with bloom columns are not hitting the > bloom datamap. > Bloom datamap is unused as per plan, even though created. > Test queries: > drop table if exists uniqdata; > CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > bigint,decimal_column1 decimal(30,10), decimal_column2 > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > int) stored as carbondata; > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into > table uniqdata > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); > create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' > PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1'); > show indexes on uniqdata; > explain select count(*) from uniqdata where cust_name="CUST_NAME_0"; > --not hitting; > explain select cust_name from uniqdata; --not hitting; > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-3807) Filter queries and projection queries with bloom columns are not hitting the bloom datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213670#comment-17213670 ] Prasanna Ravichandran commented on CARBONDATA-3807: --- !bloom_issue_verification_after_load.PNG! > Filter queries and projection queries with bloom columns are not hitting the > bloom datamap. > --- > > Key: CARBONDATA-3807 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3807 > Project: CarbonData > Issue Type: Bug > Environment: Ant cluster - opensource >Reporter: Prasanna Ravichandran >Priority: Major > Fix For: 2.0.0 > > Attachments: bloom-filtercolumn-plan.png, bloom-show index.png > > > Filter queries and projection queries with bloom columns are not hitting the > bloom datamap. > Bloom datamap is unused as per plan, even though created. > Test queries: > drop table if exists uniqdata; > CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > bigint,decimal_column1 decimal(30,10), decimal_column2 > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > int) stored as carbondata; > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into > table uniqdata > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); > create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' > PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1'); > show indexes on uniqdata; > explain select count(*) from uniqdata where cust_name="CUST_NAME_0"; > --not hitting; > explain select cust_name from uniqdata; --not hitting; > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4029) Getting Number format exception while querying on date columns in SDK carbon table.
Prasanna Ravichandran created CARBONDATA-4029: - Summary: Getting Number format exception while querying on date columns in SDK carbon table. Key: CARBONDATA-4029 URL: https://issues.apache.org/jira/browse/CARBONDATA-4029 Project: CarbonData Issue Type: Bug Affects Versions: 2.0.0 Environment: 3 node FI cluster Reporter: Prasanna Ravichandran Attachments: Primitive.rar We are getting Number format exception while querying on the date columns. Attached the SDK files also. Test queries: --SDK compaction; drop table if exists external_primitive; create table external_primitive (id int, name string, rank smallint, salary double, active boolean, dob date, doj timestamp, city string, dept string) stored as carbondata; alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive','format'='carbon'); alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon'); alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon'); alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive4','format'='carbon'); alter table external_primitive add segment options('path'='hdfs://hacluster/sdkfiles/primitive5','format'='carbon'); alter table external_primitive compact 'minor'; --working fine pass; select count(*) from external_primitive;--working fine pass; show segments for table external_primitive; select * from external_primitive limit 13; --working fine pass; select * from external_primitive limit 14; --failed getting number format exception; select min(dob) from external_primitive; --failed getting number format exception; select max(dob) from external_primitive; --working; select dob from external_primitive; --failed getting number format exception; Console: *0: /> show segments for table external_primitive;* +--++--+--+++-+--+ | ID | Status | Load Start Time | Load Time Taken | Partition | Data Size | Index Size | File Format | +--++--+--+++-+--+ | 4 | Success | 2020-10-13 11:52:04.012 | 0.511S | {} | 1.88KB | 655.0B | columnar_v3 | | 3 | Compacted | 2020-10-13 11:52:00.587 | 0.828S | {} | 1.88KB | 655.0B | columnar_v3 | | 2 | Compacted | 2020-10-13 11:51:57.767 | 0.775S | {} | 1.88KB | 655.0B | columnar_v3 | | 1 | Compacted | 2020-10-13 11:51:54.678 | 1.024S | {} | 1.88KB | 655.0B | columnar_v3 | | 0.1 | Success | 2020-10-13 11:52:05.986 | 5.785S | {} | 9.62KB | 5.01KB | columnar_v3 | | 0 | Compacted | 2020-10-13 11:51:51.072 | 1.125S | {} | 8.55KB | 4.25KB | columnar_v3 | +--++--+--+++-+--+ 6 rows selected (0.45 seconds) *0: /> select * from external_primitive limit 13;* --working fine pass; INFO : Execution ID: 95 +-+---+---+--+-+-++++ | id | name | rank | salary | active | dob | doj | city | dept | +-+---+---+--+-+-++++ | 1 | AAA | 3 | 3444345.66 | true | 1979-12-09 | 2011-02-09 22:30:20.0 | Pune | IT | | 2 | BBB | 2 | 543124.66 | false | 1987-02-19 | 2017-01-01 09:30:20.0 | Bangalore | DATA | | 3 | CCC | 1 | 787878.888 | false | 1982-05-12 | 2015-11-30 23:50:20.0 | Pune | DATA | | 4 | DDD | 1 | 9.24 | true | 1981-04-09 | 2000-01-15 04:30:20.0 | Delhi | MAINS | | 5 | EEE | 3 | 545656.99 | true | 1987-12-09 | 2017-11-25 01:30:20.0 | Delhi | IT | | 6 | FFF | 2 | 768678.0 | false | 1987-12-20 | 2017-01-10 02:30:20.0 | Bangalore | DATA | | 7 | GGG | 3 | 765665.0 | true | 1983-06-12 | 2016-12-31 23:30:20.0 | Pune | IT | | 8 | HHH | 2 | 567567.66 | false | 1979-01-12 | 1995-01-01 09:30:20.0 | Bangalore | DATA | | 9 | III | 2 | 787878.767 | true | 1985-02-19 | 2005-08-14 22:30:20.0 | Pune | DATA | | 10 | JJJ | 3 | 887877.14 | true | 2000-05-19 | 2016-10-10 09:30:20.0 | Bangalore | MAINS | | 18 | | 3 | 7.86786786787E9 | true | 1980-10-05 | 1995-10-07 19:30:20.0 | Bangalore | IT | | 19 | | 2 | 5464545.33 | true | 1986-06-06 | 2008-08-14 22:30:20.0 | Delhi | DATA | | 20 | NULL | 3 | 7867867.34 | true | 2000-05-01 | 2014-01-18 09:30:20.0 | Bangalore | MAINS | +-+---+---+--+-+-++++ 13 rows selected (1.775 seconds) *0: /> select * from external_primitive limit 14;* --failed getting number format exception; INFO : Execution ID: 97 *java.lang.NumberFormatException: For
[jira] [Created] (CARBONDATA-4024) Select queries with filter and aggregate queries are not working in Hive write - carbon table.
Prasanna Ravichandran created CARBONDATA-4024: - Summary: Select queries with filter and aggregate queries are not working in Hive write - carbon table. Key: CARBONDATA-4024 URL: https://issues.apache.org/jira/browse/CARBONDATA-4024 Project: CarbonData Issue Type: Bug Components: hive-integration Affects Versions: 2.0.0 Reporter: Prasanna Ravichandran Select queries with filter and aggregate queries are not working in Hive write - carbon table. Hive - console: 0: /> use t2; INFO : State: Compiling. INFO : Compiling command(queryId=omm_20201008191831_ac10f1ae-8d39-4185-b25a-d690134a94be): use t2; Current sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4 INFO : hive.compile.auto.avoid.cbo=true INFO : Concurrency mode is disabled, not creating a lock manager INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=omm_20201008191831_ac10f1ae-8d39-4185-b25a-d690134a94be); Time taken: 0.122 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : State: Executing. INFO : Executing command(queryId=omm_20201008191831_ac10f1ae-8d39-4185-b25a-d690134a94be): use t2; Current sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4 INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=omm_20201008191831_ac10f1ae-8d39-4185-b25a-d690134a94be); Time taken: 0.019 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager No rows affected (0.207 seconds) 0: /> show tables; INFO : State: Compiling. INFO : Compiling command(queryId=omm_20201008191835_5e1e9469-0054-446f-af82-ec3294ec77b1): show tables; Current sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4 INFO : hive.compile.auto.avoid.cbo=true INFO : Concurrency mode is disabled, not creating a lock manager INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=omm_20201008191835_5e1e9469-0054-446f-af82-ec3294ec77b1); Time taken: 0.015 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : State: Executing. INFO : Executing command(queryId=omm_20201008191835_5e1e9469-0054-446f-af82-ec3294ec77b1): show tables; Current sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4 INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=omm_20201008191835_5e1e9469-0054-446f-af82-ec3294ec77b1); Time taken: 0.016 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager ++ | tab_name | ++ | hive_carbon | | hive_table | | parquet_table | ++ 3 rows selected (0.114 seconds) 0: /> select * from hive_carbon; INFO : State: Compiling. INFO : Compiling command(queryId=omm_20201008191842_9378bab9-181c-455e-aa6d-9b4f787ce6da): select * from hive_carbon; Current sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4 INFO : hive.compile.auto.avoid.cbo=true INFO : Concurrency mode is disabled, not creating a lock manager INFO : Current sql is not contains insert syntax, not need record dest table flag INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:hive_carbon.id, type:int, comment:null), FieldSchema(name:hive_carbon.name, type:string, comment:null), FieldSchema(name:hive_carbon.scale, type:decimal(10,0), comment:null), FieldSchema(name:hive_carbon.country, type:string, comment:null), FieldSchema(name:hive_carbon.salary, type:double, comment:null)], properties:null) INFO : Completed compiling command(queryId=omm_20201008191842_9378bab9-181c-455e-aa6d-9b4f787ce6da); Time taken: 0.511 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : State: Executing. INFO : Executing command(queryId=omm_20201008191842_9378bab9-181c-455e-aa6d-9b4f787ce6da): select * from hive_carbon; Current sessionId=35d8eaaa-6d9f-4e8e-a837-e059b4eb85b4 INFO : Completed executing command(queryId=omm_20201008191842_9378bab9-181c-455e-aa6d-9b4f787ce6da); Time taken: 0.001 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager +-+---++--+-+ | hive_carbon.id | hive_carbon.name | hive_carbon.scale | hive_carbon.country | hive_carbon.salary | +-+---++--+-+ | 1 | Ram | 2 | India | 3500.0 | +-+---++--+-+ 1 row selected (0.614 seconds) 0: /> select * from hive_carbon where
[jira] [Updated] (CARBONDATA-3937) Insert into select from another carbon /parquet table is not working on Hive Beeline on a newly create Hive write format - carbon table. We are getting “Database is
[ https://issues.apache.org/jira/browse/CARBONDATA-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-3937: -- Description: Insert into select from another carbon or parquet table to a carbon table is not working on Hive Beeline on a newly create Hive write format carbon table. We are getting “Database is not set” error. Test queries: drop table if exists hive_carbon; create table hive_carbon(id int, name string, scale decimal, country string, salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler'; insert into hive_carbon select 1,"Ram","2.3","India",3500; insert into hive_carbon select 2,"Raju","2.4","Russia",3600; insert into hive_carbon select 3,"Raghu","2.5","China",3700; insert into hive_carbon select 4,"Ravi","2.6","Australia",3800; drop table if exists hive_carbon2; create table hive_carbon2(id int, name string, scale decimal, country string, salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler'; insert into hive_carbon2 select * from hive_carbon; select * from hive_carbon; select * from hive_carbon2; --execute below queries in spark-beeline; create table hive_table(id int, name string, scale decimal, country string, salary double); create table parquet_table(id int, name string, scale decimal, country string, salary double) stored as parquet; insert into hive_table select 1,"Ram","2.3","India",3500; select * from hive_table; insert into parquet_table select 1,"Ram","2.3","India",3500; select * from parquet_table; --execute the below query in hive beeline; insert into hive_carbon select * from parquet_table; Attached the logs for your reference. But the insert into select from the parquet and hive table into carbon table is working fine. Only insert into select from hive table to carbon table is only working. Error details in MR job which run through hive query: Error: java.io.IOException: java.io.IOException: Database name is not set. at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:414) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:843) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:175) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at org.apache.hadoop.mapred.YarnChild$1.run(YarnChild.java:175) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1737) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) Caused by: java.io.IOException: Database name is not set. at org.apache.carbondata.hadoop.api.CarbonInputFormat.getDatabaseName(CarbonInputFormat.java:841) at org.apache.carbondata.hive.MapredCarbonInputFormat.getCarbonTable(MapredCarbonInputFormat.java:80) at org.apache.carbondata.hive.MapredCarbonInputFormat.getQueryModel(MapredCarbonInputFormat.java:215) at org.apache.carbondata.hive.MapredCarbonInputFormat.getRecordReader(MapredCarbonInputFormat.java:205) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:411) ... 9 more was: Insert into select from another carbon or parquet table to a carbon table is not working on Hive Beeline on a newly create Hive write format carbon table. We are getting “Database is not set” error. Test queries: drop table if exists hive_carbon; create table hive_carbon(id int, name string, scale decimal, country string, salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler'; insert into hive_carbon select 1,"Ram","2.3","India",3500; insert into hive_carbon select 2,"Raju","2.4","Russia",3600; insert into hive_carbon select 3,"Raghu","2.5","China",3700; insert into hive_carbon select 4,"Ravi","2.6","Australia",3800; drop table if exists hive_carbon2; create table hive_carbon2(id int, name string, scale decimal, country string, salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler'; insert into hive_carbon2 select * from hive_carbon; select * from hive_carbon; select * from hive_carbon2; --execute below queries in spark-beeline; create table hive_table(id int, name string, scale decimal, country string, salary double); create table parquet_table(id int, name string, scale decimal, country string, salary double) stored as parquet; insert into hive_table select 1,"Ram","2.3","India",3500; select * from hive_table; insert into
[jira] [Updated] (CARBONDATA-3937) Insert into select from another carbon /parquet table is not working on Hive Beeline on a newly create Hive write format - carbon table. We are getting “Database is
[ https://issues.apache.org/jira/browse/CARBONDATA-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-3937: -- Description: Insert into select from another carbon or parquet table to a carbon table is not working on Hive Beeline on a newly create Hive write format carbon table. We are getting “Database is not set” error. Test queries: drop table if exists hive_carbon; create table hive_carbon(id int, name string, scale decimal, country string, salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler'; insert into hive_carbon select 1,"Ram","2.3","India",3500; insert into hive_carbon select 2,"Raju","2.4","Russia",3600; insert into hive_carbon select 3,"Raghu","2.5","China",3700; insert into hive_carbon select 4,"Ravi","2.6","Australia",3800; drop table if exists hive_carbon2; create table hive_carbon2(id int, name string, scale decimal, country string, salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler'; insert into hive_carbon2 select * from hive_carbon; select * from hive_carbon; select * from hive_carbon2; --execute below queries in spark-beeline; create table hive_table(id int, name string, scale decimal, country string, salary double); create table parquet_table(id int, name string, scale decimal, country string, salary double) stored as parquet; insert into hive_table select 1,"Ram","2.3","India",3500; select * from hive_table; insert into parquet_table select 1,"Ram","2.3","India",3500; select * from parquet_table; --execute the below query in hive beeline; insert into hive_carbon select * from parquet_table; Attached the logs for your reference. But the insert into select from the parquet and hive table into carbon table is working fine. Error details in MR job which run through hive query: Error: java.io.IOException: java.io.IOException: Database name is not set. at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:414) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:843) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:175) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at org.apache.hadoop.mapred.YarnChild$1.run(YarnChild.java:175) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1737) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) Caused by: java.io.IOException: Database name is not set. at org.apache.carbondata.hadoop.api.CarbonInputFormat.getDatabaseName(CarbonInputFormat.java:841) at org.apache.carbondata.hive.MapredCarbonInputFormat.getCarbonTable(MapredCarbonInputFormat.java:80) at org.apache.carbondata.hive.MapredCarbonInputFormat.getQueryModel(MapredCarbonInputFormat.java:215) at org.apache.carbondata.hive.MapredCarbonInputFormat.getRecordReader(MapredCarbonInputFormat.java:205) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:411) ... 9 more was: Insert into select from another carbon table is not working on Hive Beeline on a newly create Hive write format carbon table. We are getting “Carbondata files not found error”. Test queries: drop table if exists hive_carbon; create table hive_carbon(id int, name string, scale decimal, country string, salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler'; insert into hive_carbon select 1,"Ram","2.3","India",3500; insert into hive_carbon select 2,"Raju","2.4","Russia",3600; insert into hive_carbon select 3,"Raghu","2.5","China",3700; insert into hive_carbon select 4,"Ravi","2.6","Australia",3800; drop table if exists hive_carbon2; create table hive_carbon2(id int, name string, scale decimal, country string, salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler'; insert into hive_carbon2 select * from hive_carbon; select * from hive_carbon; select * from hive_carbon2; --execute below queries in spark-beeline; create table hive_table(id int, name string, scale decimal, country string, salary double); create table parquet_table(id int, name string, scale decimal, country string, salary double) stored as parquet; insert into hive_table select 1,"Ram","2.3","India",3500; select * from hive_table; insert into parquet_table select 1,"Ram","2.3","India",3500; select * from parquet_table; --execute the below query in hive
[jira] [Updated] (CARBONDATA-3937) Insert into select from another carbon /parquet table is not working on Hive Beeline on a newly create Hive write format - carbon table. We are getting “Database is
[ https://issues.apache.org/jira/browse/CARBONDATA-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-3937: -- Description: Insert into select from another carbon table is not working on Hive Beeline on a newly create Hive write format carbon table. We are getting “Carbondata files not found error”. Test queries: drop table if exists hive_carbon; create table hive_carbon(id int, name string, scale decimal, country string, salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler'; insert into hive_carbon select 1,"Ram","2.3","India",3500; insert into hive_carbon select 2,"Raju","2.4","Russia",3600; insert into hive_carbon select 3,"Raghu","2.5","China",3700; insert into hive_carbon select 4,"Ravi","2.6","Australia",3800; drop table if exists hive_carbon2; create table hive_carbon2(id int, name string, scale decimal, country string, salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler'; insert into hive_carbon2 select * from hive_carbon; select * from hive_carbon; select * from hive_carbon2; --execute below queries in spark-beeline; create table hive_table(id int, name string, scale decimal, country string, salary double); create table parquet_table(id int, name string, scale decimal, country string, salary double) stored as parquet; insert into hive_table select 1,"Ram","2.3","India",3500; select * from hive_table; insert into parquet_table select 1,"Ram","2.3","India",3500; select * from parquet_table; --execute the below query in hive beeline; insert into hive_carbon select * from parquet_table; Attached the logs for your reference. But the insert into select from the parquet and hive table into carbon table is working fine. Error details in MR job which run through hive query: Error: java.io.IOException: java.io.IOException: Database name is not set. at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:414) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:843) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:175) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at org.apache.hadoop.mapred.YarnChild$1.run(YarnChild.java:175) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1737) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) Caused by: java.io.IOException: Database name is not set. at org.apache.carbondata.hadoop.api.CarbonInputFormat.getDatabaseName(CarbonInputFormat.java:841) at org.apache.carbondata.hive.MapredCarbonInputFormat.getCarbonTable(MapredCarbonInputFormat.java:80) at org.apache.carbondata.hive.MapredCarbonInputFormat.getQueryModel(MapredCarbonInputFormat.java:215) at org.apache.carbondata.hive.MapredCarbonInputFormat.getRecordReader(MapredCarbonInputFormat.java:205) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:411) ... 9 more was: Insert into select from another carbon table is not working on Hive Beeline on a newly create Hive write format carbon table. We are getting “Carbondata files not found error”. Test queries: drop table if exists hive_carbon; create table hive_carbon(id int, name string, scale decimal, country string, salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler'; insert into hive_carbon select 1,"Ram","2.3","India",3500; insert into hive_carbon select 2,"Raju","2.4","Russia",3600; insert into hive_carbon select 3,"Raghu","2.5","China",3700; insert into hive_carbon select 4,"Ravi","2.6","Australia",3800; drop table if exists hive_carbon2; create table hive_carbon2(id int, name string, scale decimal, country string, salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler'; insert into hive_carbon2 select * from hive_carbon; select * from hive_carbon; select * from hive_carbon2; Attached the logs for your reference. But the insert into select from the parquet and hive table into carbon table is working fine. Error details in MR job which run through hive query: Error: java.io.IOException: java.io.IOException: CarbonData file is not present in the table location at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at
[jira] [Updated] (CARBONDATA-3938) In Hive read table, we are unable to read a projection column or read a full scan - select * query. Even the aggregate queries are not working.
[ https://issues.apache.org/jira/browse/CARBONDATA-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-3938: -- Description: In Hive read table, we are unable to read a projection column or full scan query. But the aggregate queries are working fine. Test query: --spark beeline; drop table if exists uniqdata; drop table if exists uniqdata1; CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) stored as carbondata ; LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',', 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); CREATE TABLE IF NOT EXISTS uniqdata1 (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) ROW FORMAT SERDE 'org.apache.carbondata.hive.CarbonHiveSerDe' WITH SERDEPROPERTIES ('mapreduce.input.carboninputformat.databaseName'='default','mapreduce.input.carboninputformat.tableName'='uniqdata') STORED AS INPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonInputFormat' OUTPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonOutputFormat' LOCATION 'hdfs://hacluster/user/hive/warehouse/uniqdata'; select count(*) from uniqdata1; --Hive Beeline; select count(*) from uniqdata1; --not working, returning 0 rows, eventhough 2000 rows are there;--Issue 1 on Hive read format table; select * from uniqdata1; --Return no rows;--Issue 2 - a) full scan on Hive read format table; select cust_id from uniqdata1 limit 5;--Return no rows;–Issue 2-b select query with projection, not working, returning now rows; Attached the logs for your reference. With the Hive write table this issue is not seen. Issue is only seen in Hive read format table. This issue also exists when a normal carbon table is created in Spark and read through Hive beeline. was: In Hive read table, we are unable to read a projection column or full scan query. But the aggregate queries are working fine. Test query: --spark beeline; drop table if exists uniqdata; drop table if exists uniqdata1; CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) stored as carbondata ; LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',', 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); CREATE TABLE IF NOT EXISTS uniqdata1 (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) ROW FORMAT SERDE 'org.apache.carbondata.hive.CarbonHiveSerDe' WITH SERDEPROPERTIES ('mapreduce.input.carboninputformat.databaseName'='default','mapreduce.input.carboninputformat.tableName'='uniqdata') STORED AS INPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonInputFormat' OUTPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonOutputFormat' LOCATION 'hdfs://hacluster/user/hive/warehouse/uniqdata'; select count(*) from uniqdata1; --Hive Beeline; select count(*) from uniqdata1; --Returns 2000; select count(*) from uniqdata; --Returns 2000 - working fine; select * from uniqdata1; --Return no rows;–Issue 1 on Hive read format table; select * from uniqdata;–Returns no rows;–Issue 2 while reading a normal carbon table created in spark; select cust_id from uniqdata1 limit 5;--Return no rows; Attached the logs for your reference. With the Hive write table this issue is not seen. Issue is only seen in Hive read format table. This issue also exists when a normal carbon table is created in Spark and read through Hive beeline. Summary: In Hive read table, we are unable to read a projection column or read a full scan - select * query. Even the aggregate queries are not working. (was: In Hive read table, we are unable to read a projection column or read a full scan - select * query.
[jira] [Created] (CARBONDATA-4022) Getting the error - "PathName is not a valid DFS filename." with index server and after adding carbon SDK segments and then doing select/update/delete operations.
Prasanna Ravichandran created CARBONDATA-4022: - Summary: Getting the error - "PathName is not a valid DFS filename." with index server and after adding carbon SDK segments and then doing select/update/delete operations. Key: CARBONDATA-4022 URL: https://issues.apache.org/jira/browse/CARBONDATA-4022 Project: CarbonData Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanna Ravichandran Getting this error - "PathName is not a valid DFS filename." during the update/delete/select queries on a added SDK segment table. Also the path represented in the error is not proper, which is the cause of error. This is seen only when index server is running and disable fallback is true. Queries and errors: > create table sdk_2level_1(name string, rec1 > struct>) stored as carbondata; +-+ | Result | +-+ +-+ No rows selected (0.425 seconds) > alter table sdk_2level_1 add segment > options('path'='hdfs://hacluster/sdkfiles/twolevelnestedrecwitharray','format'='carbondata'); +-+ | Result | +-+ +-+ No rows selected (0.77 seconds) > select * from sdk_2level_1; INFO : Execution ID: 1855 Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 600.0 failed 4 times, most recent failure: Lost task 0.3 in stage 600.0 (TID 21345, linux, executor 16): java.lang.IllegalArgumentException: Pathname /user/hive/warehouse/carbon.store/rps/sdk_2level_1hdfs:/hacluster/sdkfiles/twolevelnestedrecwitharray/part-0-188852617294480_batchno0-0-null-188852332673632.carbondata from hdfs://hacluster/user/hive/warehouse/carbon.store/rps/sdk_2level_1hdfs:/hacluster/sdkfiles/twolevelnestedrecwitharray/part-0-188852617294480_batchno0-0-null-188852332673632.carbondata is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:249) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:332) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:328) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:340) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:955) at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getDataInputStream(AbstractDFSCarbonFile.java:316) at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.getDataInputStream(AbstractDFSCarbonFile.java:293) at org.apache.carbondata.core.datastore.impl.FileFactory.getDataInputStream(FileFactory.java:198) at org.apache.carbondata.core.datastore.impl.FileFactory.getDataInputStream(FileFactory.java:188) at org.apache.carbondata.core.reader.ThriftReader.open(ThriftReader.java:100) at org.apache.carbondata.core.reader.CarbonHeaderReader.readHeader(CarbonHeaderReader.java:60) at org.apache.carbondata.core.util.DataFileFooterConverterV3.readDataFileFooter(DataFileFooterConverterV3.java:65) at org.apache.carbondata.core.util.CarbonUtil.getDataFileFooter(CarbonUtil.java:902) at org.apache.carbondata.core.util.CarbonUtil.readMetadataFile(CarbonUtil.java:874) at org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getDataBlocks(AbstractQueryExecutor.java:216) at org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:138) at org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:382) at org.apache.carbondata.core.scan.executor.impl.DetailQueryExecutor.execute(DetailQueryExecutor.java:47) at org.apache.carbondata.hadoop.CarbonRecordReader.initialize(CarbonRecordReader.java:117) at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:540) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:584) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:301) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:293) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:857) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:857) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at
[jira] [Created] (CARBONDATA-4021) With Index server running, Upon executing count* we are getting the below error, after adding the parquet and ORC segment.
Prasanna Ravichandran created CARBONDATA-4021: - Summary: With Index server running, Upon executing count* we are getting the below error, after adding the parquet and ORC segment. Key: CARBONDATA-4021 URL: https://issues.apache.org/jira/browse/CARBONDATA-4021 Project: CarbonData Issue Type: Bug Affects Versions: 2.0.0 Reporter: Prasanna Ravichandran We are getting below issues while index server enable and index server fallback disable is configured as true. With count* we are getting the below error, after adding the parquet and ORC segment. Queries and error: > use rps; +-+| Result | +-+ +-+ No rows selected (0.054 seconds) > drop table if exists uniqdata; +-+ |Result| +-+ +-+ No rows selected (0.229 seconds) > CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > bigint,decimal_column1 decimal(30,10), decimal_column2 > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > int) stored as carbondata; +-+ |Result| +-+ +-+ No rows selected (0.756 seconds) > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into > table uniqdata > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); INFO : Execution ID: 95 +-+| Result | +-+ +-+ No rows selected(2.789 seconds) > use default; +-+ |Result| +-+ +-+ No rows selected (0.052 seconds) > drop table if exists uniqdata; +-+ |Result| +-+ +-+ No rows selected (1.122 seconds) > CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > bigint,decimal_column1 decimal(30,10), decimal_column2 > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > int) stored as carbondata; +-+ | Result | +-+ +-+ No rows selected (0.508 seconds) > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into > table uniqdata > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); INFO : Execution ID: 108 +-+ |Result| +-+ +-+ No rows selected (1.316 seconds) > drop table if exists uniqdata_parquet; +-+ |Result| +-+ +-+ No rows selected (0.668 seconds) > CREATE TABLE uniqdata_parquet (cust_id int,cust_name > String,active_emui_version string, dob timestamp, doj timestamp, > bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), > decimal_column2 decimal(36,36),double_column1 double, double_column2 > double,integer_column1 int) stored as parquet; +-+ |Result| +-+ +-+ No rows selected (0.397 seconds) > insert into uniqdata_parquet select * from uniqdata; INFO : Execution ID: 116 +-+ |Result| +-+ +-+ No rows selected (4.805 seconds) > drop table if exists uniqdata_orc; +-+ |Result| +-+ +-+ No rows selected (0.553 seconds) > CREATE TABLE uniqdata_orc (cust_id int,cust_name String,active_emui_version > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > bigint,decimal_column1 decimal(30,10), decimal_column2 > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > int) using orc; +-+ |Result| +-+ +-+ No rows selected (0.396 seconds) > insert into uniqdata_orc select * from uniqdata; INFO : Execution ID: 122 +-+ |Result| +-+ +-+ No rows selected (3.403 seconds) > use rps; +-+ |Result| +-+ +-+ No rows selected (0.06 seconds) > Alter table uniqdata add segment options > ('path'='hdfs://hacluster/user/hive/warehouse/uniqdata_parquet','format'='parquet'); INFO : Execution ID: 126 +-+ |Result| +-+ +-+ No rows selected (1.511 seconds) > Alter table uniqdata add segment options > ('path'='hdfs://hacluster/user/hive/warehouse/uniqdata_orc','format'='orc'); +-+ |Result| +-+ +-+ No rows selected (0.716 seconds) > select count(*) from uniqdata; Error: java.io.IOException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): java.security.PrivilegedActionException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 54.0 failed 4 times, most recent failure: Lost task 2.3
[jira] [Commented] (CARBONDATA-3914) We are getting the below error when executing select query on a carbon table when no data is returned from hive beeline.
[ https://issues.apache.org/jira/browse/CARBONDATA-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205497#comment-17205497 ] Prasanna Ravichandran commented on CARBONDATA-3914: --- !image-2020-10-01-18-37-20-242.png! > We are getting the below error when executing select query on a carbon table > when no data is returned from hive beeline. > > > Key: CARBONDATA-3914 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3914 > Project: CarbonData > Issue Type: Bug > Components: hive-integration >Affects Versions: 2.0.0 > Environment: 3 node One track ANT cluster >Reporter: Prasanna Ravichandran >Priority: Minor > Fix For: 2.1.0 > > Attachments: Nodatareturnedfromcarbontable-IOexception.png > > Time Spent: 6h 20m > Remaining Estimate: 0h > > If no data is present in the table, then we are getting the below IOException > in carbon, while running select queries on that empty table. But in hive even > if the table holds no data, then it is working for select queries. > Expected results: Even the table holds no records it should return 0 or no > rows returned. It should not throw error/exception. > Actual result: It is throwing IO exception - Unable to read carbon schema. > > Attached the screenshot for your reference. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CARBONDATA-3914) We are getting the below error when executing select query on a carbon table when no data is returned from hive beeline.
[ https://issues.apache.org/jira/browse/CARBONDATA-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205495#comment-17205495 ] Prasanna Ravichandran commented on CARBONDATA-3914: --- Attached the screenshot after the fix. > We are getting the below error when executing select query on a carbon table > when no data is returned from hive beeline. > > > Key: CARBONDATA-3914 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3914 > Project: CarbonData > Issue Type: Bug > Components: hive-integration >Affects Versions: 2.0.0 > Environment: 3 node One track ANT cluster >Reporter: Prasanna Ravichandran >Priority: Minor > Fix For: 2.1.0 > > Attachments: Nodatareturnedfromcarbontable-IOexception.png > > Time Spent: 6h 20m > Remaining Estimate: 0h > > If no data is present in the table, then we are getting the below IOException > in carbon, while running select queries on that empty table. But in hive even > if the table holds no data, then it is working for select queries. > Expected results: Even the table holds no records it should return 0 or no > rows returned. It should not throw error/exception. > Actual result: It is throwing IO exception - Unable to read carbon schema. > > Attached the screenshot for your reference. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (CARBONDATA-3914) We are getting the below error when executing select query on a carbon table when no data is returned from hive beeline.
[ https://issues.apache.org/jira/browse/CARBONDATA-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran closed CARBONDATA-3914. - This issue is fixed now. Now no errors are thrown, when no rows are present in the carbon table. > We are getting the below error when executing select query on a carbon table > when no data is returned from hive beeline. > > > Key: CARBONDATA-3914 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3914 > Project: CarbonData > Issue Type: Bug > Components: hive-integration >Affects Versions: 2.0.0 > Environment: 3 node One track ANT cluster >Reporter: Prasanna Ravichandran >Priority: Minor > Fix For: 2.1.0 > > Attachments: Nodatareturnedfromcarbontable-IOexception.png > > Time Spent: 6h 20m > Remaining Estimate: 0h > > If no data is present in the table, then we are getting the below IOException > in carbon, while running select queries on that empty table. But in hive even > if the table holds no data, then it is working for select queries. > Expected results: Even the table holds no records it should return 0 or no > rows returned. It should not throw error/exception. > Actual result: It is throwing IO exception - Unable to read carbon schema. > > Attached the screenshot for your reference. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4012) Documentations issues.
Prasanna Ravichandran created CARBONDATA-4012: - Summary: Documentations issues. Key: CARBONDATA-4012 URL: https://issues.apache.org/jira/browse/CARBONDATA-4012 Project: CarbonData Issue Type: Bug Reporter: Prasanna Ravichandran Support Array and Struct of all primitive type reading from presto from Carbon tables. This feature details have to be added in the below opensource link: [https://github.com/apache/carbondata/blob/master/docs/prestosql-guide.md] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4012) Documentations issues.
[ https://issues.apache.org/jira/browse/CARBONDATA-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-4012: -- Description: Support Array and Struct of all primitive type reading on presto from Spark Carbon tables. This feature details have to be added in the below opensource link: [https://github.com/apache/carbondata/blob/master/docs/prestosql-guide.md] was: Support Array and Struct of all primitive type reading from presto from Carbon tables. This feature details have to be added in the below opensource link: [https://github.com/apache/carbondata/blob/master/docs/prestosql-guide.md] > Documentations issues. > -- > > Key: CARBONDATA-4012 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4012 > Project: CarbonData > Issue Type: Bug >Reporter: Prasanna Ravichandran >Priority: Minor > > Support Array and Struct of all primitive type reading on presto from Spark > Carbon tables. This feature details have to be added in the below opensource > link: > [https://github.com/apache/carbondata/blob/master/docs/prestosql-guide.md] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3938) In Hive read table, we are unable to read a projection column or read a full scan - select * query. But the aggregate queries are working fine.
[ https://issues.apache.org/jira/browse/CARBONDATA-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-3938: -- Description: In Hive read table, we are unable to read a projection column or full scan query. But the aggregate queries are working fine. Test query: --spark beeline; drop table if exists uniqdata; drop table if exists uniqdata1; CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) stored as carbondata ; LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',', 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); CREATE TABLE IF NOT EXISTS uniqdata1 (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) ROW FORMAT SERDE 'org.apache.carbondata.hive.CarbonHiveSerDe' WITH SERDEPROPERTIES ('mapreduce.input.carboninputformat.databaseName'='default','mapreduce.input.carboninputformat.tableName'='uniqdata') STORED AS INPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonInputFormat' OUTPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonOutputFormat' LOCATION 'hdfs://hacluster/user/hive/warehouse/uniqdata'; select count(*) from uniqdata1; --Hive Beeline; select count(*) from uniqdata1; --Returns 2000; select count(*) from uniqdata; --Returns 2000 - working fine; select * from uniqdata1; --Return no rows;–Issue 1 on Hive read format table; select * from uniqdata;–Returns no rows;–Issue 2 while reading a normal carbon table created in spark; select cust_id from uniqdata1 limit 5;--Return no rows; Attached the logs for your reference. With the Hive write table this issue is not seen. Issue is only seen in Hive read format table. This issue also exists when a normal carbon table is created in Spark and read through Hive beeline. was: In Hive read table, we are unable to read a projection column or full scan query. But the aggregate queries are working fine. Test query: --spark beeline; drop table if exists uniqdata; drop table if exists uniqdata1; CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) stored as carbondata ; LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',', 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); CREATE TABLE IF NOT EXISTS uniqdata1 (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) ROW FORMAT SERDE 'org.apache.carbondata.hive.CarbonHiveSerDe' WITH SERDEPROPERTIES ('mapreduce.input.carboninputformat.databaseName'='default','mapreduce.input.carboninputformat.tableName'='uniqdata') STORED AS INPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonInputFormat' OUTPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonOutputFormat' LOCATION 'hdfs://hacluster/user/hive/warehouse/uniqdata'; select count(*) from uniqdata1; --Hive Beeline; select count(*) from uniqdata1; --Returns 2000; select * from uniqdata1; --Return no rows; select cust_id from uniqdata1 limit 5;--Return no rows; Attached the logs for your reference. With the Hive write table this issue is not seen. Issue is only seen in Hive read format table. > In Hive read table, we are unable to read a projection column or read a full > scan - select * query. But the aggregate queries are working fine. > --- > > Key: CARBONDATA-3938 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3938 > Project: CarbonData > Issue Type: Bug > Components: hive-integration >Affects Versions: 2.0.0 >Reporter: Prasanna
[jira] [Created] (CARBONDATA-3938) In Hive read table, we are unable to read a projection column or read a full scan - select * query. But the aggregate queries are working fine.
Prasanna Ravichandran created CARBONDATA-3938: - Summary: In Hive read table, we are unable to read a projection column or read a full scan - select * query. But the aggregate queries are working fine. Key: CARBONDATA-3938 URL: https://issues.apache.org/jira/browse/CARBONDATA-3938 Project: CarbonData Issue Type: Bug Components: hive-integration Affects Versions: 2.0.0 Reporter: Prasanna Ravichandran Attachments: Hive on MR - Read projection column issue.txt In Hive read table, we are unable to read a projection column or full scan query. But the aggregate queries are working fine. Test query: --spark beeline; drop table if exists uniqdata; drop table if exists uniqdata1; CREATE TABLE uniqdata(CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) stored as carbondata ; LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata OPTIONS('DELIMITER'=',', 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); CREATE TABLE IF NOT EXISTS uniqdata1 (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) ROW FORMAT SERDE 'org.apache.carbondata.hive.CarbonHiveSerDe' WITH SERDEPROPERTIES ('mapreduce.input.carboninputformat.databaseName'='default','mapreduce.input.carboninputformat.tableName'='uniqdata') STORED AS INPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonInputFormat' OUTPUTFORMAT 'org.apache.carbondata.hive.MapredCarbonOutputFormat' LOCATION 'hdfs://hacluster/user/hive/warehouse/uniqdata'; select count(*) from uniqdata1; --Hive Beeline; select count(*) from uniqdata1; --Returns 2000; select * from uniqdata1; --Return no rows; select cust_id from uniqdata1 limit 5;--Return no rows; Attached the logs for your reference. With the Hive write table this issue is not seen. Issue is only seen in Hive read format table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3937) Insert into select from another carbon table is not working on Hive Beeline on a newly create Hive write format - carbon table. We are getting “Carbondata files not
Prasanna Ravichandran created CARBONDATA-3937: - Summary: Insert into select from another carbon table is not working on Hive Beeline on a newly create Hive write format - carbon table. We are getting “Carbondata files not found error" Key: CARBONDATA-3937 URL: https://issues.apache.org/jira/browse/CARBONDATA-3937 Project: CarbonData Issue Type: Bug Components: hive-integration Affects Versions: 2.0.0 Reporter: Prasanna Ravichandran Insert into select from another carbon table is not working on Hive Beeline on a newly create Hive write format carbon table. We are getting “Carbondata files not found error”. Test queries: drop table if exists hive_carbon; create table hive_carbon(id int, name string, scale decimal, country string, salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler'; insert into hive_carbon select 1,"Ram","2.3","India",3500; insert into hive_carbon select 2,"Raju","2.4","Russia",3600; insert into hive_carbon select 3,"Raghu","2.5","China",3700; insert into hive_carbon select 4,"Ravi","2.6","Australia",3800; drop table if exists hive_carbon2; create table hive_carbon2(id int, name string, scale decimal, country string, salary double) stored by 'org.apache.carbondata.hive.CarbonStorageHandler'; insert into hive_carbon2 select * from hive_carbon; select * from hive_carbon; select * from hive_carbon2; Attached the logs for your reference. But the insert into select from the parquet and hive table into carbon table is working fine. Error details in MR job which run through hive query: Error: java.io.IOException: java.io.IOException: CarbonData file is not present in the table location at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:414) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:843) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:175) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at org.apache.hadoop.mapred.YarnChild$1.run(YarnChild.java:175) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) Caused by: java.io.IOException: CarbonData file is not present in the table location at org.apache.carbondata.core.util.CarbonUtil.inferSchema(CarbonUtil.java:2141) at org.apache.carbondata.core.metadata.schema.SchemaReader.inferSchema(SchemaReader.java:139) at org.apache.carbondata.hive.MapredCarbonInputFormat.populateCarbonTable(MapredCarbonInputFormat.java:92) at org.apache.carbondata.hive.MapredCarbonInputFormat.getCarbonTable(MapredCarbonInputFormat.java:104) at org.apache.carbondata.hive.MapredCarbonInputFormat.getQueryModel(MapredCarbonInputFormat.java:203) at org.apache.carbondata.hive.MapredCarbonInputFormat.getRecordReader(MapredCarbonInputFormat.java:192) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:411) ... 9 more -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3915) Correction in the documentation for spark-shell
Prasanna Ravichandran created CARBONDATA-3915: - Summary: Correction in the documentation for spark-shell Key: CARBONDATA-3915 URL: https://issues.apache.org/jira/browse/CARBONDATA-3915 Project: CarbonData Issue Type: Bug Components: hive-integration Affects Versions: 2.0.0 Environment: 3 node ANT cluster one track. Reporter: Prasanna Ravichandran Spark-Shell program is not working, which is given in the [https://github.com/apache/carbondata/blob/master/docs/hive-guide.md] Working program is given as below: import org.apache.spark.sql.SparkSession import org.apache.spark.sql.CarbonSession._ val newSpark = SparkSession.builder().config(sc.getConf).enableHiveSupport.config("spark.sql.extensions","org.apache.spark.sql.CarbonExtensions").getOrCreate() newSpark.sql("drop table if exists hive_carbon").show newSpark.sql("create table hive_carbon(id int, name string, scale decimal, country string, salary double) STORED AS carbondata").show newSpark.sql("LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/samplehive.csv' INTO TABLE hive_carbon").show newSpark.sql("SELECT * FROM hive_carbon").show() so could update the above working program in the [https://github.com/apache/carbondata/blob/master/docs/hive-guide.md] page, under the "Start Spark shell by running the following command in the Spark directory" section. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3914) We are getting the below error when executing select query on a carbon table when no data is returned from hive beeline.
Prasanna Ravichandran created CARBONDATA-3914: - Summary: We are getting the below error when executing select query on a carbon table when no data is returned from hive beeline. Key: CARBONDATA-3914 URL: https://issues.apache.org/jira/browse/CARBONDATA-3914 Project: CarbonData Issue Type: Bug Components: hive-integration Affects Versions: 2.0.0 Environment: 3 node One track ANT cluster Reporter: Prasanna Ravichandran Attachments: Nodatareturnedfromcarbontable-IOexception.png If no data is present in the table, then we are getting the below IOException in carbon, while running select queries on that empty table. But in hive even if the table holds no data, then it is working for select queries. Expected results: Even the table holds no records it should return 0 or no rows returned. It should not throw error/exception. Actual result: It is throwing IO exception - Unable to read carbon schema. Attached the screenshot for your reference. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3908) When a carbon segment is added through the alter add segments query, then it is not accounting the added carbon segment values.
Prasanna Ravichandran created CARBONDATA-3908: - Summary: When a carbon segment is added through the alter add segments query, then it is not accounting the added carbon segment values. Key: CARBONDATA-3908 URL: https://issues.apache.org/jira/browse/CARBONDATA-3908 Project: CarbonData Issue Type: Bug Affects Versions: 2.0.0 Environment: FI cluster and opensource cluster. Reporter: Prasanna Ravichandran When a carbon segment is added through the alter add segments query, then it is not accounting the added carbon segment values. If we do count(*) on the added segment, then it is always showing as 0. Test queries: drop table if exists uniqdata; CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata; load data inpath 'hdfs://hacluster/BabuStore/Data/2000_UniqData.csv' into table uniqdata options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); --hdfs dfs -mkdir /uniqdata-carbon-segment; --hdfs dfs -cp /user/hive/warehouse/uniqdata/Fact/Part0/Segment_0/* /uniqdata-carbon-segment/ Alter table uniqdata add segment options ('path'='hdfs://hacluster/uniqdata-carbon-segment/','format'='carbon'); select count(*) from uniqdata;--4000 expected as one load of 2000 records happened and same segment is added again; set carbon.input.segments.default.uniqdata=1; select count(*) from uniqdata;--2000 expected - it should just show the records count of added segments; CONSOLE: /> set carbon.input.segments.default.uniqdata=1; +-++ | key | value | +-++ | carbon.input.segments.default.uniqdata | 1 | +-++ 1 row selected (0.192 seconds) /> select count(*) from uniqdata; INFO : Execution ID: 1734 +---+ | count(1) | +---+ | 2000 | +---+ 1 row selected (4.036 seconds) /> set carbon.input.segments.default.uniqdata=2; +-++ | key | value | +-++ | carbon.input.segments.default.uniqdata | 2 | +-++ 1 row selected (0.088 seconds) /> select count(*) from uniqdata; INFO : Execution ID: 1745 +---+ | count(1) | +---+ | 2000 | +---+ 1 row selected (6.056 seconds) /> set carbon.input.segments.default.uniqdata=3; +-++ | key | value | +-++ | carbon.input.segments.default.uniqdata | 3 | +-++ 1 row selected (0.161 seconds) /> select count(*) from uniqdata; INFO : Execution ID: 1753 +---+ | count(1) | +---+ | 0 | +---+ 1 row selected (4.875 seconds) /> show segments for table uniqdata; +-+--+--+--+++-+--+ | ID | Status | Load Start Time | Load Time Taken | Partition | Data Size | Index Size | File Format | +-+--+--+--+++-+--+ | 4 | Success | 2020-07-17 16:01:53.673 | 5.579S | {} | 269.10KB | 7.21KB | columnar_v3 | | 3 | Success | 2020-07-17 16:00:24.866 | 0.578S | {} | 88.55KB | 1.81KB | columnar_v3 | | 2 | Success | 2020-07-17 15:07:54.273 | 0.642S | {} | 36.72KB | NA | orc | | 1 | Success | 2020-07-17 15:03:59.767 | 0.564S | {} | 89.26KB | NA | parquet | | 0 | Success | 2020-07-16 12:44:32.095 | 4.484S | {} | 88.55KB | 1.81KB | columnar_v3 | +-+--+--+--+++-+--+ Expected result: Records added by adding carbon segment should be considered. Actual result: Records added by adding carbon segment is not considered. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (CARBONDATA-3811) In Flat folder enabled table, it is returning no records while querying.
[ https://issues.apache.org/jira/browse/CARBONDATA-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran closed CARBONDATA-3811. - Fixed. > In Flat folder enabled table, it is returning no records while querying. > > > Key: CARBONDATA-3811 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3811 > Project: CarbonData > Issue Type: Bug > Environment: opensource ANT cluster >Reporter: Prasanna Ravichandran >Priority: Major > Fix For: 2.0.0 > > Attachments: Flat_folder_returning_zero.png > > > Flat folder table is retuning no records for select queries. > > Test queries: > drop table if exists uniqdata1; > CREATE TABLE uniqdata1 (cust_id int,cust_name String,active_emui_version > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > bigint,decimal_column1 decimal(30,10), decimal_column2 > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > int) stored as carbondata TBLPROPERTIES('flat_folder'='true'); > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into > table uniqdata1 > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); > select count(*) from uniqdata1;--0; > select * from uniqdata1 limit 10;--0; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3827) Merge DDL is not working as per the mentioned syntax.
[ https://issues.apache.org/jira/browse/CARBONDATA-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-3827: -- Description: This issue is seen with opensource jars. Spark 2.4.5 & Carbon 2.0. Merge DDL is not working as per the mentioned syntax as in CARBONDATA-3597 Test queries: drop table if exists uniqdata1; CREATE TABLE uniqdata1 (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata; load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata1 options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); drop table if exists uniqdata; CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata; load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); merge into uniqdata1 as a using uniqdata as b on a.cust_id=b.cust_id; --not working , getting parse exeption; >merge into uniqdata1 as a using uniqdata as b on a.cust_id=b.cust_id; Error: org.apache.spark.sql.AnalysisException: == Parser1: org.apache.spark.sql.parser.CarbonExtensionSpark2SqlParser == [1.1] failure: identifier matching regex (?i)EXPLAIN expected merge into uniqdata1 as a using uniqdata as b on a.cust_id=b.cust_id ^; == Parser2: org.apache.spark.sql.execution.SparkSqlParser == mismatched input 'merge' expecting \{'(', 'SELECT', 'FROM', 'ADD', 'DESC', 'EMPOWER', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'INSERT', 'DELETE', 'DESCRIBE', 'EXPLAIN', 'SHOW', 'USE', 'DROP', 'ALTER', 'MAP', 'SET', 'RESET', 'START', 'COMMIT', 'ROLLBACK', 'REDUCE', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'DFS', 'TRUNCATE', 'ANALYZE', 'LIST', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'EXPORT', 'IMPORT', 'LOAD', 'HEALTHCHECK'}(line 1, pos 0) == SQL == merge into uniqdata1 as a using uniqdata as b on a.cust_id=b.cust_id ^^^; (state=,code=0) was: This issue is seen with opensource jars. Spark 2.4.5 & Carbon 2.0. Merge DDL is not working as per the mentioned syntax as in CARBONDATA-3597 Test queries: drop table if exists uniqdata1; CREATE TABLE uniqdata1 (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata; load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata1 options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); drop table if exists uniqdata; CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata; load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); merge into uniqdata1 as a using uniqdata as b on a.cust_id=b.cust_id; --not working; Attached the screenshot for your reference. !image-2020-05-18-21-30-31-344.png! > Merge DDL is not working as per the mentioned syntax. > - > > Key: CARBONDATA-3827 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3827 > Project: CarbonData > Issue Type: Bug >Reporter: Prasanna Ravichandran >Priority: Major > > This issue is seen with opensource jars. Spark 2.4.5 & Carbon 2.0. > Merge DDL is not working as per the mentioned syntax as in CARBONDATA-3597 > Test queries: > drop table if exists uniqdata1; > CREATE TABLE uniqdata1 (cust_id
[jira] [Updated] (CARBONDATA-3827) Merge DDL is not working as per the mentioned syntax.
[ https://issues.apache.org/jira/browse/CARBONDATA-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-3827: -- Description: This issue is seen with opensource jars. Spark 2.4.5 & Carbon 2.0. Merge DDL is not working as per the mentioned syntax as in CARBONDATA-3597 Test queries: drop table if exists uniqdata1; CREATE TABLE uniqdata1 (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata; load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata1 options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); drop table if exists uniqdata; CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata; load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); merge into uniqdata1 as a using uniqdata as b on a.cust_id=b.cust_id; --not working; Attached the screenshot for your reference. !image-2020-05-18-21-30-31-344.png! was: This issue is seen with opensource jars. Spark 2.4.5 & Carbon 2.0. Merge DDL is not working as per the mentioned syntax as in CARBONDATA-3597 Test queries: drop table if exists uniqdata1; CREATE TABLE uniqdata1 (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata; load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata1 options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); drop table if exists uniqdata; CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata; load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); merge into uniqdata1 as a using uniqdata as b on a.cust_id=b.cust_id; --not working; Attached the screenshot for your reference. > Merge DDL is not working as per the mentioned syntax. > - > > Key: CARBONDATA-3827 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3827 > Project: CarbonData > Issue Type: Bug >Reporter: Prasanna Ravichandran >Priority: Major > > This issue is seen with opensource jars. Spark 2.4.5 & Carbon 2.0. > Merge DDL is not working as per the mentioned syntax as in CARBONDATA-3597 > Test queries: > drop table if exists uniqdata1; > CREATE TABLE uniqdata1 (cust_id int,cust_name String,active_emui_version > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > bigint,decimal_column1 decimal(30,10), decimal_column2 > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > int) stored as carbondata; > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into > table uniqdata1 > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); > drop table if exists uniqdata; > CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > bigint,decimal_column1 decimal(30,10), decimal_column2 > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > int) stored as
[jira] [Created] (CARBONDATA-3827) Merge DDL is not working as per the mentioned syntax.
Prasanna Ravichandran created CARBONDATA-3827: - Summary: Merge DDL is not working as per the mentioned syntax. Key: CARBONDATA-3827 URL: https://issues.apache.org/jira/browse/CARBONDATA-3827 Project: CarbonData Issue Type: Bug Reporter: Prasanna Ravichandran This issue is seen with opensource jars. Spark 2.4.5 & Carbon 2.0. Merge DDL is not working as per the mentioned syntax as in CARBONDATA-3597 Test queries: drop table if exists uniqdata1; CREATE TABLE uniqdata1 (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata; load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata1 options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); drop table if exists uniqdata; CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata; load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); merge into uniqdata1 as a using uniqdata as b on a.cust_id=b.cust_id; --not working; Attached the screenshot for your reference. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3826) Merge DDL is not working as per the mentioned syntax.
Prasanna Ravichandran created CARBONDATA-3826: - Summary: Merge DDL is not working as per the mentioned syntax. Key: CARBONDATA-3826 URL: https://issues.apache.org/jira/browse/CARBONDATA-3826 Project: CarbonData Issue Type: Bug Reporter: Prasanna Ravichandran This issue is seen with opensource jars. Spark 2.4.5 & Carbon 2.0. Merge DDL is not working as per the mentioned syntax as in CARBONDATA-3597 Test queries: drop table if exists uniqdata1; CREATE TABLE uniqdata1 (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata; load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata1 options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); drop table if exists uniqdata; CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata; load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); merge into uniqdata1 as a using uniqdata as b on a.cust_id=b.cust_id; --not working; Attached the screenshot for your reference. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3819) Fileformat column details is not present in the show segments DDL for heterogenous segments table.
Prasanna Ravichandran created CARBONDATA-3819: - Summary: Fileformat column details is not present in the show segments DDL for heterogenous segments table. Key: CARBONDATA-3819 URL: https://issues.apache.org/jira/browse/CARBONDATA-3819 Project: CarbonData Issue Type: Bug Environment: Opensource ANT cluster Reporter: Prasanna Ravichandran Attachments: fileformat_notworking_actualresult.PNG, fileformat_working_expected.PNG Fileformat column details is not present in the show segments DDL for heterogenous segments table. Test steps: # Create a heterogenous table with added parquet and carbon segments. # DO show segments. Expected results: It should show "FileFormat" column details in show segments DDL. Actual result: It is not showing the File format column details in show segments DDL. See the attached screenshots for more details. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3811) In Flat folder enabled table, it is returning no records while querying.
[ https://issues.apache.org/jira/browse/CARBONDATA-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-3811: -- Attachment: Flat_folder_returning_zero.png > In Flat folder enabled table, it is returning no records while querying. > > > Key: CARBONDATA-3811 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3811 > Project: CarbonData > Issue Type: Bug > Environment: opensource ANT cluster >Reporter: Prasanna Ravichandran >Priority: Major > Attachments: Flat_folder_returning_zero.png > > > Flat folder table is retuning no records for select queries. > > Test queries: > drop table if exists uniqdata1; > CREATE TABLE uniqdata1 (cust_id int,cust_name String,active_emui_version > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > bigint,decimal_column1 decimal(30,10), decimal_column2 > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > int) stored as carbondata TBLPROPERTIES('flat_folder'='true'); > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into > table uniqdata1 > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); > select count(*) from uniqdata1;--0; > select * from uniqdata1 limit 10;--0; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3807) Filter queries and projection queries with bloom columns are not hitting the bloom datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-3807: -- Attachment: bloom-show index.png > Filter queries and projection queries with bloom columns are not hitting the > bloom datamap. > --- > > Key: CARBONDATA-3807 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3807 > Project: CarbonData > Issue Type: Bug > Environment: Ant cluster - opensource >Reporter: Prasanna Ravichandran >Priority: Major > Attachments: bloom-filtercolumn-plan.png, bloom-show index.png > > > Filter queries and projection queries with bloom columns are not hitting the > bloom datamap. > Bloom datamap is unused as per plan, even though created. > Test queries: > drop table if exists uniqdata; > CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > bigint,decimal_column1 decimal(30,10), decimal_column2 > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > int) stored as carbondata; > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into > table uniqdata > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); > create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' > PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1'); > show indexes on uniqdata; > explain select count(*) from uniqdata where cust_name="CUST_NAME_0"; > --not hitting; > explain select cust_name from uniqdata; --not hitting; > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3807) Filter queries and projection queries with bloom columns are not hitting the bloom datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-3807: -- Attachment: bloom-filtercolumn-plan.png > Filter queries and projection queries with bloom columns are not hitting the > bloom datamap. > --- > > Key: CARBONDATA-3807 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3807 > Project: CarbonData > Issue Type: Bug > Environment: Ant cluster - opensource >Reporter: Prasanna Ravichandran >Priority: Major > Attachments: bloom-filtercolumn-plan.png > > > Filter queries and projection queries with bloom columns are not hitting the > bloom datamap. > Bloom datamap is unused as per plan, even though created. > Test queries: > drop table if exists uniqdata; > CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > bigint,decimal_column1 decimal(30,10), decimal_column2 > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > int) stored as carbondata; > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into > table uniqdata > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); > create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' > PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1'); > show indexes on uniqdata; > explain select count(*) from uniqdata where cust_name="CUST_NAME_0"; > --not hitting; > explain select cust_name from uniqdata; --not hitting; > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3807) Filter queries and projection queries with bloom columns are not hitting the bloom datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-3807: -- Attachment: (was: bloom-filtercolumn-plan.png) > Filter queries and projection queries with bloom columns are not hitting the > bloom datamap. > --- > > Key: CARBONDATA-3807 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3807 > Project: CarbonData > Issue Type: Bug > Environment: Ant cluster - opensource >Reporter: Prasanna Ravichandran >Priority: Major > Attachments: bloom-filtercolumn-plan.png > > > Filter queries and projection queries with bloom columns are not hitting the > bloom datamap. > Bloom datamap is unused as per plan, even though created. > Test queries: > drop table if exists uniqdata; > CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > bigint,decimal_column1 decimal(30,10), decimal_column2 > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > int) stored as carbondata; > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into > table uniqdata > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); > create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' > PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1'); > show indexes on uniqdata; > explain select count(*) from uniqdata where cust_name="CUST_NAME_0"; > --not hitting; > explain select cust_name from uniqdata; --not hitting; > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3807) Filter queries and projection queries with bloom columns are not hitting the bloom datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-3807: -- Attachment: (was: bloom-show index.png) > Filter queries and projection queries with bloom columns are not hitting the > bloom datamap. > --- > > Key: CARBONDATA-3807 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3807 > Project: CarbonData > Issue Type: Bug > Environment: Ant cluster - opensource >Reporter: Prasanna Ravichandran >Priority: Major > Attachments: bloom-filtercolumn-plan.png > > > Filter queries and projection queries with bloom columns are not hitting the > bloom datamap. > Bloom datamap is unused as per plan, even though created. > Test queries: > drop table if exists uniqdata; > CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > bigint,decimal_column1 decimal(30,10), decimal_column2 > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > int) stored as carbondata; > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into > table uniqdata > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); > create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' > PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1'); > show indexes on uniqdata; > explain select count(*) from uniqdata where cust_name="CUST_NAME_0"; > --not hitting; > explain select cust_name from uniqdata; --not hitting; > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3811) In Flat folder enabled table, it is returning no records while querying.
[ https://issues.apache.org/jira/browse/CARBONDATA-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-3811: -- Attachment: (was: Flat_folder_returning_zero.png) > In Flat folder enabled table, it is returning no records while querying. > > > Key: CARBONDATA-3811 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3811 > Project: CarbonData > Issue Type: Bug > Environment: opensource ANT cluster >Reporter: Prasanna Ravichandran >Priority: Major > > Flat folder table is retuning no records for select queries. > > Test queries: > drop table if exists uniqdata1; > CREATE TABLE uniqdata1 (cust_id int,cust_name String,active_emui_version > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > bigint,decimal_column1 decimal(30,10), decimal_column2 > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > int) stored as carbondata TBLPROPERTIES('flat_folder'='true'); > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into > table uniqdata1 > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); > select count(*) from uniqdata1;--0; > select * from uniqdata1 limit 10;--0; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3811) In Flat folder enabled table, it is returning no records while querying.
Prasanna Ravichandran created CARBONDATA-3811: - Summary: In Flat folder enabled table, it is returning no records while querying. Key: CARBONDATA-3811 URL: https://issues.apache.org/jira/browse/CARBONDATA-3811 Project: CarbonData Issue Type: Bug Environment: opensource ANT cluster Reporter: Prasanna Ravichandran Attachments: Flat_folder_returning_zero.png Flat folder table is retuning no records for select queries. Test queries: drop table if exists uniqdata1; CREATE TABLE uniqdata1 (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata TBLPROPERTIES('flat_folder'='true'); load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata1 options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); select count(*) from uniqdata1;--0; select * from uniqdata1 limit 10;--0; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-3807) Filter queries and projection queries with bloom columns are not hitting the bloom datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-3807: -- Description: Filter queries and projection queries with bloom columns are not hitting the bloom datamap. Bloom datamap is unused as per plan, even though created. Test queries: drop table if exists uniqdata; CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata; load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1'); show indexes on uniqdata; explain select count(*) from uniqdata where cust_name="CUST_NAME_0"; --not hitting; explain select cust_name from uniqdata; --not hitting; was: Filter queries and projection queries with bloom columns are not hitting the bloom datamap. Test queries: drop table if exists uniqdata; CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata; load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1'); show indexes on uniqdata; explain select count(*) from uniqdata where cust_name="CUST_NAME_0"; --not hitting; explain select cust_name from uniqdata; --not hitting; > Filter queries and projection queries with bloom columns are not hitting the > bloom datamap. > --- > > Key: CARBONDATA-3807 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3807 > Project: CarbonData > Issue Type: Bug > Environment: Ant cluster - opensource >Reporter: Prasanna Ravichandran >Priority: Major > Attachments: bloom-filtercolumn-plan.png, bloom-show index.png > > > Filter queries and projection queries with bloom columns are not hitting the > bloom datamap. > Bloom datamap is unused as per plan, even though created. > Test queries: > drop table if exists uniqdata; > CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 > bigint,decimal_column1 decimal(30,10), decimal_column2 > decimal(36,36),double_column1 double, double_column2 double,integer_column1 > int) stored as carbondata; > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into > table uniqdata > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); > create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' > PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1'); > show indexes on uniqdata; > explain select count(*) from uniqdata where cust_name="CUST_NAME_0"; > --not hitting; > explain select cust_name from uniqdata; --not hitting; > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-3807) Filter queries and projection queries with bloom columns are not hitting the bloom datamap.
Prasanna Ravichandran created CARBONDATA-3807: - Summary: Filter queries and projection queries with bloom columns are not hitting the bloom datamap. Key: CARBONDATA-3807 URL: https://issues.apache.org/jira/browse/CARBONDATA-3807 Project: CarbonData Issue Type: Bug Environment: Ant cluster - opensource Reporter: Prasanna Ravichandran Attachments: bloom-filtercolumn-plan.png, bloom-show index.png Filter queries and projection queries with bloom columns are not hitting the bloom datamap. Test queries: drop table if exists uniqdata; CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), decimal_column2 decimal(36,36),double_column1 double, double_column2 double,integer_column1 int) stored as carbondata; load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into table uniqdata options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force'); create datamap datamapuniq_b1 on table uniqdata(cust_name) as 'bloomfilter' PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1'); show indexes on uniqdata; explain select count(*) from uniqdata where cust_name="CUST_NAME_0"; --not hitting; explain select cust_name from uniqdata; --not hitting; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-2920) For the Long string data, the local dictionary threshold is not reached even if the threshold condition is met
Prasanna Ravichandran created CARBONDATA-2920: - Summary: For the Long string data, the local dictionary threshold is not reached even if the threshold condition is met Key: CARBONDATA-2920 URL: https://issues.apache.org/jira/browse/CARBONDATA-2920 Project: CarbonData Issue Type: Bug Environment: 3 node cluster Reporter: Prasanna Ravichandran For the Long string data, the local dictionary threshold is not reached even if the threshold condition is met. 【Test step】: 1. Create table with long string column with local dictionary threshold as 1000. 2. Load more than 1000 distinct LONG data. 3. Check if the threshold is met. *Test queries:* drop table if exists 1klongdata; create table 1klongdata(st string) stored by 'carbondata' TBLPROPERTIES('local_dictionary_enable'='true','local_dictionary_threshold'='1000','long_string_columns'='st'); load data inpath "hdfs://hacluster/user/prasanna/1005longdata.csv" into table 1klongdata options('fileheader'='st'); 【Expected Output】:Once the local dictionary threshold is crossed, it should display as "Local Dictionary threshold reached for the column: col_name, Unable to generate dictionary value. Dictionary threshold reached" in executor log. 【Actual Output】:It is not printing the fallback details for long data even if the threshold limit is reached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2892) Data mismatch is seen in the Array-String and Array-Timestamp.
[ https://issues.apache.org/jira/browse/CARBONDATA-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-2892: -- Attachment: (was: Array.csv) > Data mismatch is seen in the Array-String and Array-Timestamp. > -- > > Key: CARBONDATA-2892 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2892 > Project: CarbonData > Issue Type: Bug > Environment: 3 Node ANT. >Reporter: Prasanna Ravichandran >Priority: Major > Attachments: Array.csv > > > Data mismatch is seen in the Array-String and Array-Timestamp like mismatch > in data, order, date values. > *Test queries:* > drop table if exists array_com_hive; > create table array_com_hive (CUST_ID string, YEAR int, MONTH int, AGE int, > GENDER string, EDUCATED string, IS_MARRIED string, ARRAY_INT > array,ARRAY_STRING array,ARRAY_DATE array,CARD_COUNT > int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) row > format delimited fields terminated by ',' collection items terminated by '$'; > load data local inpath '/opt/csv/complex/Array.csv' into table array_com_hive; > drop table if exists array_com; > create table Array_com (CUST_ID string, YEAR int, MONTH int, AGE int, GENDER > string, EDUCATED string, IS_MARRIED string, ARRAY_INT array,ARRAY_STRING > array,ARRAY_DATE array,CARD_COUNT int,DEBIT_COUNT int, > CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) using carbon; > insert into Array_com select * from array_com_hive; > select * from array_com_hive order by CUST_ID ASC limit 3; > select * from array_com order by CUST_ID ASC limit 3; > *Expected result:* > There should be no data mismatch and data in table should be same as it is in > CSV file. > *Actual result:* > Data mismatch is seen. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2892) Data mismatch is seen in the Array-String and Array-Timestamp.
[ https://issues.apache.org/jira/browse/CARBONDATA-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-2892: -- Attachment: Array.csv > Data mismatch is seen in the Array-String and Array-Timestamp. > -- > > Key: CARBONDATA-2892 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2892 > Project: CarbonData > Issue Type: Bug > Environment: 3 Node ANT. >Reporter: Prasanna Ravichandran >Priority: Major > Attachments: Array.csv > > > Data mismatch is seen in the Array-String and Array-Timestamp like mismatch > in data, order, date values. > *Test queries:* > drop table if exists array_com_hive; > create table array_com_hive (CUST_ID string, YEAR int, MONTH int, AGE int, > GENDER string, EDUCATED string, IS_MARRIED string, ARRAY_INT > array,ARRAY_STRING array,ARRAY_DATE array,CARD_COUNT > int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) row > format delimited fields terminated by ',' collection items terminated by '$'; > load data local inpath '/opt/csv/complex/Array.csv' into table array_com_hive; > drop table if exists array_com; > create table Array_com (CUST_ID string, YEAR int, MONTH int, AGE int, GENDER > string, EDUCATED string, IS_MARRIED string, ARRAY_INT array,ARRAY_STRING > array,ARRAY_DATE array,CARD_COUNT int,DEBIT_COUNT int, > CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) using carbon; > insert into Array_com select * from array_com_hive; > select * from array_com_hive order by CUST_ID ASC limit 3; > select * from array_com order by CUST_ID ASC limit 3; > *Expected result:* > There should be no data mismatch and data in table should be same as it is in > CSV file. > *Actual result:* > Data mismatch is seen. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2893) Job aborted during insert while loading the "Struct of Array" datatype values.
[ https://issues.apache.org/jira/browse/CARBONDATA-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-2893: -- Attachment: structofarray.csv > Job aborted during insert while loading the "Struct of Array" datatype values. > -- > > Key: CARBONDATA-2893 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2893 > Project: CarbonData > Issue Type: Bug > Environment: 3 Node ANT. >Reporter: Prasanna Ravichandran >Priority: Major > Attachments: structofarray.csv > > > Job aborted during insert while loading the "Struct of Array" datatype values. > *Test queries:* > 0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com_hive; > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.026 seconds) > 0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com_hive (CUST_ID string, YEAR > int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, > STRUCT_OF_ARRAY struct,sal1: > array,state: array,date1: array>,CARD_COUNT > int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT float, HQ_DEPOSIT double) row > format delimited fields terminated by ',' collection items terminated by '$' > map keys terminated by '&'; > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.159 seconds) > 0: jdbc:hive2:> load data local inpath '/opt/csv/complex/structofarray.csv' > into table STRUCT_OF_ARRAY_com_hive; > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.217 seconds) > 0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com; > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.03 seconds) > 0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com (CUST_ID string, YEAR int, > MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, > STRUCT_OF_ARRAY struct,sal1: > array,state: array,date1: array>,CARD_COUNT > int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) > using carbon; > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.099 seconds) > 0: jdbc:hive2:> insert into STRUCT_OF_ARRAY_com select * from > STRUCT_OF_ARRAY_com_hive; > *Error: org.apache.spark.SparkException: Job aborted. (state=,code=0)* > > *Expected result:* > Insert should be success. > *Actual result:* > Insert is showing job aborted. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2893) Job aborted during insert while loading the "Struct of Array" datatype values.
[ https://issues.apache.org/jira/browse/CARBONDATA-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-2893: -- Attachment: (was: structofarray.csv) > Job aborted during insert while loading the "Struct of Array" datatype values. > -- > > Key: CARBONDATA-2893 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2893 > Project: CarbonData > Issue Type: Bug > Environment: 3 Node ANT. >Reporter: Prasanna Ravichandran >Priority: Major > Attachments: structofarray.csv > > > Job aborted during insert while loading the "Struct of Array" datatype values. > *Test queries:* > 0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com_hive; > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.026 seconds) > 0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com_hive (CUST_ID string, YEAR > int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, > STRUCT_OF_ARRAY struct,sal1: > array,state: array,date1: array>,CARD_COUNT > int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT float, HQ_DEPOSIT double) row > format delimited fields terminated by ',' collection items terminated by '$' > map keys terminated by '&'; > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.159 seconds) > 0: jdbc:hive2:> load data local inpath '/opt/csv/complex/structofarray.csv' > into table STRUCT_OF_ARRAY_com_hive; > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.217 seconds) > 0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com; > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.03 seconds) > 0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com (CUST_ID string, YEAR int, > MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, > STRUCT_OF_ARRAY struct,sal1: > array,state: array,date1: array>,CARD_COUNT > int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) > using carbon; > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.099 seconds) > 0: jdbc:hive2:> insert into STRUCT_OF_ARRAY_com select * from > STRUCT_OF_ARRAY_com_hive; > *Error: org.apache.spark.SparkException: Job aborted. (state=,code=0)* > > *Expected result:* > Insert should be success. > *Actual result:* > Insert is showing job aborted. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2893) Job aborted during insert while loading the "Struct of Array" datatype values.
[ https://issues.apache.org/jira/browse/CARBONDATA-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-2893: -- Attachment: structofarray.csv > Job aborted during insert while loading the "Struct of Array" datatype values. > -- > > Key: CARBONDATA-2893 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2893 > Project: CarbonData > Issue Type: Bug > Environment: 3 Node ANT. >Reporter: Prasanna Ravichandran >Priority: Major > Attachments: structofarray.csv > > > Job aborted during insert while loading the "Struct of Array" datatype values. > *Test queries:* > 0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com_hive; > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.026 seconds) > 0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com_hive (CUST_ID string, YEAR > int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, > STRUCT_OF_ARRAY struct,sal1: > array,state: array,date1: array>,CARD_COUNT > int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT float, HQ_DEPOSIT double) row > format delimited fields terminated by ',' collection items terminated by '$' > map keys terminated by '&'; > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.159 seconds) > 0: jdbc:hive2:> load data local inpath '/opt/csv/complex/structofarray.csv' > into table STRUCT_OF_ARRAY_com_hive; > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.217 seconds) > 0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com; > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.03 seconds) > 0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com (CUST_ID string, YEAR int, > MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, > STRUCT_OF_ARRAY struct,sal1: > array,state: array,date1: array>,CARD_COUNT > int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) > using carbon; > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.099 seconds) > 0: jdbc:hive2:> insert into STRUCT_OF_ARRAY_com select * from > STRUCT_OF_ARRAY_com_hive; > *Error: org.apache.spark.SparkException: Job aborted. (state=,code=0)* > > *Expected result:* > Insert should be success. > *Actual result:* > Insert is showing job aborted. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2893) Job aborted during insert while loading the "Struct of Array" datatype values.
[ https://issues.apache.org/jira/browse/CARBONDATA-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-2893: -- Attachment: (was: arrayofstruct.csv) > Job aborted during insert while loading the "Struct of Array" datatype values. > -- > > Key: CARBONDATA-2893 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2893 > Project: CarbonData > Issue Type: Bug > Environment: 3 Node ANT. >Reporter: Prasanna Ravichandran >Priority: Major > Attachments: structofarray.csv > > > Job aborted during insert while loading the "Struct of Array" datatype values. > *Test queries:* > 0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com_hive; > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.026 seconds) > 0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com_hive (CUST_ID string, YEAR > int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, > STRUCT_OF_ARRAY struct,sal1: > array,state: array,date1: array>,CARD_COUNT > int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT float, HQ_DEPOSIT double) row > format delimited fields terminated by ',' collection items terminated by '$' > map keys terminated by '&'; > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.159 seconds) > 0: jdbc:hive2:> load data local inpath '/opt/csv/complex/structofarray.csv' > into table STRUCT_OF_ARRAY_com_hive; > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.217 seconds) > 0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com; > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.03 seconds) > 0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com (CUST_ID string, YEAR int, > MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, > STRUCT_OF_ARRAY struct,sal1: > array,state: array,date1: array>,CARD_COUNT > int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) > using carbon; > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.099 seconds) > 0: jdbc:hive2:> insert into STRUCT_OF_ARRAY_com select * from > STRUCT_OF_ARRAY_com_hive; > *Error: org.apache.spark.SparkException: Job aborted. (state=,code=0)* > > *Expected result:* > Insert should be success. > *Actual result:* > Insert is showing job aborted. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2893) Job aborted during insert while loading the "Struct of Array" datatype values.
[ https://issues.apache.org/jira/browse/CARBONDATA-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-2893: -- Description: Job aborted during insert while loading the "Struct of Array" datatype values. *Test queries:* 0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com_hive; +--+-+ |Result| +--+-+ +--+-+ No rows selected (0.026 seconds) 0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com_hive (CUST_ID string, YEAR int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, STRUCT_OF_ARRAY struct,sal1: array,state: array,date1: array>,CARD_COUNT int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT float, HQ_DEPOSIT double) row format delimited fields terminated by ',' collection items terminated by '$' map keys terminated by '&'; +--+-+ |Result| +--+-+ +--+-+ No rows selected (0.159 seconds) 0: jdbc:hive2:> load data local inpath '/opt/csv/complex/structofarray.csv' into table STRUCT_OF_ARRAY_com_hive; +--+-+ |Result| +--+-+ +--+-+ No rows selected (0.217 seconds) 0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com; +--+-+ |Result| +--+-+ +--+-+ No rows selected (0.03 seconds) 0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com (CUST_ID string, YEAR int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, STRUCT_OF_ARRAY struct,sal1: array,state: array,date1: array>,CARD_COUNT int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) using carbon; +--+-+ |Result| +--+-+ +--+-+ No rows selected (0.099 seconds) 0: jdbc:hive2:> insert into STRUCT_OF_ARRAY_com select * from STRUCT_OF_ARRAY_com_hive; *Error: org.apache.spark.SparkException: Job aborted. (state=,code=0)* *Expected result:* Insert should be success. *Actual result:* Insert is showing job aborted. was: Job aborted during insert while loading the "Struct of Array" datatype values. *Test queries:* 0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com_hive; +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.026 seconds) 0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com_hive (CUST_ID string, YEAR int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, STRUCT_OF_ARRAY struct,sal1: array,state: array,date1: array>,CARD_COUNT int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT float, HQ_DEPOSIT double) row format delimited fields terminated by ',' collection items terminated by '$' map keys terminated by '&'; +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.159 seconds) 0: jdbc:hive2:> load data local inpath '/opt/csv/complex/structofarray.csv' into table STRUCT_OF_ARRAY_com_hive; +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.217 seconds) 0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com; +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.03 seconds) 0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com (CUST_ID string, YEAR int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, STRUCT_OF_ARRAY struct,sal1: array,state: array,date1: array>,CARD_COUNT int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) using carbon; +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.099 seconds) 0: jdbc:hive2:> insert into STRUCT_OF_ARRAY_com select * from STRUCT_OF_ARRAY_com_hive; *Error: org.apache.spark.SparkException: Job aborted. (state=,code=0)* > Job aborted during insert while loading the "Struct of Array" datatype values. > -- > > Key: CARBONDATA-2893 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2893 > Project: CarbonData > Issue Type: Bug > Environment: 3 Node ANT. >Reporter: Prasanna Ravichandran >Priority: Major > Attachments: arrayofstruct.csv > > > Job aborted during insert while loading the "Struct of Array" datatype values. > *Test queries:* > 0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com_hive; > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.026 seconds) > 0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com_hive (CUST_ID string, YEAR > int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, > STRUCT_OF_ARRAY struct,sal1: > array,state: array,date1: array>,CARD_COUNT > int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT float, HQ_DEPOSIT double) row > format delimited fields terminated by ',' collection items terminated by '$' > map keys terminated by '&'; > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.159 seconds) > 0:
[jira] [Created] (CARBONDATA-2893) Job aborted during insert while loading the "Struct of Array" datatype values.
Prasanna Ravichandran created CARBONDATA-2893: - Summary: Job aborted during insert while loading the "Struct of Array" datatype values. Key: CARBONDATA-2893 URL: https://issues.apache.org/jira/browse/CARBONDATA-2893 Project: CarbonData Issue Type: Bug Environment: 3 Node ANT. Reporter: Prasanna Ravichandran Attachments: arrayofstruct.csv Job aborted during insert while loading the "Struct of Array" datatype values. *Test queries:* 0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com_hive; +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.026 seconds) 0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com_hive (CUST_ID string, YEAR int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, STRUCT_OF_ARRAY struct,sal1: array,state: array,date1: array>,CARD_COUNT int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT float, HQ_DEPOSIT double) row format delimited fields terminated by ',' collection items terminated by '$' map keys terminated by '&'; +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.159 seconds) 0: jdbc:hive2:> load data local inpath '/opt/csv/complex/structofarray.csv' into table STRUCT_OF_ARRAY_com_hive; +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.217 seconds) 0: jdbc:hive2:> drop table if exists STRUCT_OF_ARRAY_com; +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.03 seconds) 0: jdbc:hive2:> create table STRUCT_OF_ARRAY_com (CUST_ID string, YEAR int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, STRUCT_OF_ARRAY struct,sal1: array,state: array,date1: array>,CARD_COUNT int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) using carbon; +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.099 seconds) 0: jdbc:hive2:> insert into STRUCT_OF_ARRAY_com select * from STRUCT_OF_ARRAY_com_hive; *Error: org.apache.spark.SparkException: Job aborted. (state=,code=0)* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2892) Data mismatch is seen in the Array-String and Array-Timestamp.
Prasanna Ravichandran created CARBONDATA-2892: - Summary: Data mismatch is seen in the Array-String and Array-Timestamp. Key: CARBONDATA-2892 URL: https://issues.apache.org/jira/browse/CARBONDATA-2892 Project: CarbonData Issue Type: Bug Environment: 3 Node ANT. Reporter: Prasanna Ravichandran Attachments: Array.csv Data mismatch is seen in the Array-String and Array-Timestamp like mismatch in data, order, date values. *Test queries:* drop table if exists array_com_hive; create table array_com_hive (CUST_ID string, YEAR int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, ARRAY_INT array,ARRAY_STRING array,ARRAY_DATE array,CARD_COUNT int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) row format delimited fields terminated by ',' collection items terminated by '$'; load data local inpath '/opt/csv/complex/Array.csv' into table array_com_hive; drop table if exists array_com; create table Array_com (CUST_ID string, YEAR int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, ARRAY_INT array,ARRAY_STRING array,ARRAY_DATE array,CARD_COUNT int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) using carbon; insert into Array_com select * from array_com_hive; select * from array_com_hive order by CUST_ID ASC limit 3; select * from array_com order by CUST_ID ASC limit 3; *Expected result:* There should be no data mismatch and data in table should be same as it is in CSV file. *Actual result:* Data mismatch is seen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (CARBONDATA-2822) Carbon Configuration - "carbon.invisible.segments.preserve.count" configuration property is not working as expected.
[ https://issues.apache.org/jira/browse/CARBONDATA-2822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran closed CARBONDATA-2822. - > Carbon Configuration - "carbon.invisible.segments.preserve.count" > configuration property is not working as expected. > - > > Key: CARBONDATA-2822 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2822 > Project: CarbonData > Issue Type: Bug > Components: core, file-format > Environment: 3 Node ANT cluster. >Reporter: Prasanna Ravichandran >Priority: Minor > Attachments: configuration.png > > > For the *carbon.invisible.segments.preserve.count* configuration, it is not > working as expected. > +*Steps to reproduce:*+ > 1) Setting up "*carbon.invisible.segments.preserve.count=20"* in > carbon.properties and restarting the thrift server. > > 2) After performing Loading 40 times and Compaction 4 times. > 3) Perform clean files, so that the tablestatus.history file would be > generated with invisible segments details. > So Total 44 segments would be created including visible and invisible > segments.(40 load segment (like segment ID from 0,1,2...39) + 4 compacted new > segment(like 0.1,20.1,22.1,0.2)) > In that, *41 segments information are present in the "tablestatus.history" > file(*which holds invisible(marked for delete and compacted) segments > details) and 3 segments information are present in the "tablestatus" > file(which holds visible segments(0 .2 -final compacted segment) along with > (1^st^ segment - 0th segment) and (last segment-39th segment)). *But > invisible segment preserve count is configured to 20, which is not followed > for the tablestatus.history file.* > +*Expected result:*+ > tablestatus.history file should preserve only the latest 20 segments, as per > the configuration. > +*Actual result:*+ > tablestatus.history file is having 41 invisible segments details.(which is > above the configured value: 20) > > This is tested with ANT cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (CARBONDATA-2822) Carbon Configuration - "carbon.invisible.segments.preserve.count" configuration property is not working as expected.
[ https://issues.apache.org/jira/browse/CARBONDATA-2822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran resolved CARBONDATA-2822. --- Resolution: Invalid Working fine. > Carbon Configuration - "carbon.invisible.segments.preserve.count" > configuration property is not working as expected. > - > > Key: CARBONDATA-2822 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2822 > Project: CarbonData > Issue Type: Bug > Components: core, file-format > Environment: 3 Node ANT cluster. >Reporter: Prasanna Ravichandran >Priority: Minor > Attachments: configuration.png > > > For the *carbon.invisible.segments.preserve.count* configuration, it is not > working as expected. > +*Steps to reproduce:*+ > 1) Setting up "*carbon.invisible.segments.preserve.count=20"* in > carbon.properties and restarting the thrift server. > > 2) After performing Loading 40 times and Compaction 4 times. > 3) Perform clean files, so that the tablestatus.history file would be > generated with invisible segments details. > So Total 44 segments would be created including visible and invisible > segments.(40 load segment (like segment ID from 0,1,2...39) + 4 compacted new > segment(like 0.1,20.1,22.1,0.2)) > In that, *41 segments information are present in the "tablestatus.history" > file(*which holds invisible(marked for delete and compacted) segments > details) and 3 segments information are present in the "tablestatus" > file(which holds visible segments(0 .2 -final compacted segment) along with > (1^st^ segment - 0th segment) and (last segment-39th segment)). *But > invisible segment preserve count is configured to 20, which is not followed > for the tablestatus.history file.* > +*Expected result:*+ > tablestatus.history file should preserve only the latest 20 segments, as per > the configuration. > +*Actual result:*+ > tablestatus.history file is having 41 invisible segments details.(which is > above the configured value: 20) > > This is tested with ANT cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-2822) Carbon Configuration - "carbon.invisible.segments.preserve.count" configuration property is not working as expected.
[ https://issues.apache.org/jira/browse/CARBONDATA-2822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571437#comment-16571437 ] Prasanna Ravichandran commented on CARBONDATA-2822: --- The property *"carbon.invisible.segments.preserve.count"* is actually for TableStatusFile only. When we set this property, if the number of invisible segments in tablestatus file exceeds that configured *carbon.invisible.segments.preserve.count value,* then it is moving all the invisible segments to the tablestatus.history file. It is working fine as expected. > Carbon Configuration - "carbon.invisible.segments.preserve.count" > configuration property is not working as expected. > - > > Key: CARBONDATA-2822 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2822 > Project: CarbonData > Issue Type: Bug > Components: core, file-format > Environment: 3 Node ANT cluster. >Reporter: Prasanna Ravichandran >Priority: Minor > Attachments: configuration.png > > > For the *carbon.invisible.segments.preserve.count* configuration, it is not > working as expected. > +*Steps to reproduce:*+ > 1) Setting up "*carbon.invisible.segments.preserve.count=20"* in > carbon.properties and restarting the thrift server. > > 2) After performing Loading 40 times and Compaction 4 times. > 3) Perform clean files, so that the tablestatus.history file would be > generated with invisible segments details. > So Total 44 segments would be created including visible and invisible > segments.(40 load segment (like segment ID from 0,1,2...39) + 4 compacted new > segment(like 0.1,20.1,22.1,0.2)) > In that, *41 segments information are present in the "tablestatus.history" > file(*which holds invisible(marked for delete and compacted) segments > details) and 3 segments information are present in the "tablestatus" > file(which holds visible segments(0 .2 -final compacted segment) along with > (1^st^ segment - 0th segment) and (last segment-39th segment)). *But > invisible segment preserve count is configured to 20, which is not followed > for the tablestatus.history file.* > +*Expected result:*+ > tablestatus.history file should preserve only the latest 20 segments, as per > the configuration. > +*Actual result:*+ > tablestatus.history file is having 41 invisible segments details.(which is > above the configured value: 20) > > This is tested with ANT cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2816) MV Datamap - With the hive metastore disabled, MV is not working as expected.
[ https://issues.apache.org/jira/browse/CARBONDATA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-2816: -- Description: When the hive metastore is disabled(spark.carbon.hive.schema.store=false), then the below issues are seen. CARBONDATA-2534 CARBONDATA-2539 CARBONDATA-2576 was: When the hive metastore is disabled(spark.carbon.hive.schema.store=false), then the below issues are seen. CARBONDATA-2540 CARBONDATA-2539 CARBONDATA-2576 > MV Datamap - With the hive metastore disabled, MV is not working as expected. > - > > Key: CARBONDATA-2816 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2816 > Project: CarbonData > Issue Type: Bug > Components: data-query >Reporter: Prasanna Ravichandran >Priority: Minor > Labels: MV > > When the hive metastore is disabled(spark.carbon.hive.schema.store=false), > then the below issues are seen. > CARBONDATA-2534 > CARBONDATA-2539 > CARBONDATA-2576 > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-2816) MV Datamap - With the hive metastore disabled, MV is not working as expected.
Prasanna Ravichandran created CARBONDATA-2816: - Summary: MV Datamap - With the hive metastore disabled, MV is not working as expected. Key: CARBONDATA-2816 URL: https://issues.apache.org/jira/browse/CARBONDATA-2816 Project: CarbonData Issue Type: Bug Components: data-query Reporter: Prasanna Ravichandran When the hive metastore is disabled(spark.carbon.hive.schema.store=false), then the below issues are seen. CARBONDATA-2540 CARBONDATA-2539 CARBONDATA-2576 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2576) MV Datamap - MV is not working fine if there is more than 3 aggregate function in the same datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-2576: -- Description: MV is not working fine if there is more than 3 aggregate function in the same datamap. It is working fine upto 3 aggregate functions on the same MV. Please see the attached document for more details. Test queries: scala> carbon.sql("create datamap datamap_comp_maxsumminavg using 'mv' as select empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) from originTable group by empno").show(200,false) ++ ++ ++ rebuild data scala> carbon.sql("rebuild datamap datamap_comp_maxsumminavg").show(200,false) ++ ++ ++ scala> carbon.sql("explain select empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) from originTable group by empno").show(200,false) org.apache.spark.sql.AnalysisException: expression 'datamap_comp_maxsumminavg_table.`avg_attendance`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;; Aggregate [origintable_empno#2925|#2925], [origintable_empno#2925 AS empno#3002, max(max_projectenddate#2926) AS max(projectenddate)#3003, sum(sum_salary#2927L) AS sum(salary)#3004L, min(min_projectjoindate#2928) AS min(projectjoindate)#3005, avg_attendance#2929 AS avg(attendance)#3006|#2925 AS empno#3002, max(max_projectenddate#2926) AS max(projectenddate)#3003, sum(sum_salary#2927L) AS sum(salary)#3004L, min(min_projectjoindate#2928) AS min(projectjoindate)#3005, avg_attendance#2929 AS avg(attendance)#3006] +- SubqueryAlias datamap_comp_maxsumminavg_table +- Relation[origintable_empno#2925,max_projectenddate#2926,sum_salary#2927L,min_projectjoindate#2928,avg_attendance#2929|#2925,max_projectenddate#2926,sum_salary#2927L,min_projectjoindate#2928,avg_attendance#2929] CarbonDatasourceHadoopRelation [ Database name :default, Table name :datamap_comp_maxsumminavg_table, Schema :Some(StructType(StructField(origintable_empno,IntegerType,true), StructField(max_projectenddate,TimestampType,true), StructField(sum_salary,LongType,true), StructField(min_projectjoindate,TimestampType,true), StructField(avg_attendance,DoubleType,true))) ] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:39) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:247) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:253) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:280) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:78) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:78) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:52) at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:148) at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:95) at org.apache.carbondata.mv.datamap.MVAnalyzerRule.apply(MVAnalyzerRule.scala:72) at org.apache.carbondata.mv.datamap.MVAnalyzerRule.apply(MVAnalyzerRule.scala:38) at org.apache.spark.sql.hive.CarbonAnalyzer.execute(CarbonAnalyzer.scala:46) at org.apache.spark.sql.hive.CarbonAnalyzer.execute(CarbonAnalyzer.scala:27) at
[jira] [Commented] (CARBONDATA-2534) MV Dataset - MV creation is not working with the substring()
[ https://issues.apache.org/jira/browse/CARBONDATA-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565167#comment-16565167 ] Prasanna Ravichandran commented on CARBONDATA-2534: --- When the user executes the MV datamap query, it should be accessed from MV_Table. > MV Dataset - MV creation is not working with the substring() > - > > Key: CARBONDATA-2534 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2534 > Project: CarbonData > Issue Type: Bug > Components: data-query > Environment: 3 node opensource ANT cluster >Reporter: Prasanna Ravichandran >Priority: Minor > Labels: CarbonData, MV, Materialistic_Views > Fix For: 1.5.0, 1.4.1 > > Attachments: MV_substring.docx, data.csv > > Time Spent: 3h 10m > Remaining Estimate: 0h > > MV creation is not working with the sub string function. We are getting the > spark.sql.AnalysisException while trying to create a MV with the substring > and aggregate function. > *Spark -shell test queries:* > scala> carbon.sql("create datamap mv_substr using 'mv' as select > sum(salary),substring(empname,2,5),designation from originTable group by > substring(empname,2,5),designation").show(200,false) > *org.apache.spark.sql.AnalysisException: Cannot create a table having a > column whose name contains commas in Hive metastore. Table: > `default`.`mv_substr_table`; Column: substring_empname,_2,_5;* > *at* > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:150) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:148) > at scala.collection.immutable.List.foreach(List.scala:381) > at > org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema(HiveExternalCatalog.scala:148) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply$mcV$sp(HiveExternalCatalog.scala:222) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) > at > org.apache.spark.sql.hive.HiveExternalCatalog.doCreateTable(HiveExternalCatalog.scala:216) > at > org.apache.spark.sql.catalyst.catalog.ExternalCatalog.createTable(ExternalCatalog.scala:110) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:316) > at > org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:119) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67) > at org.apache.spark.sql.Dataset.(Dataset.scala:183) > at > org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:108) > at > org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:97) > at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:155) > at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:95) > at > org.apache.spark.sql.execution.command.table.CarbonCreateTableCommand.processMetadata(CarbonCreateTableCommand.scala:126) > at > org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:68) > at > org.apache.carbondata.mv.datamap.MVHelper$.createMVDataMap(MVHelper.scala:103) > at > org.apache.carbondata.mv.datamap.MVDataMapProvider.initMeta(MVDataMapProvider.scala:53) > at > org.apache.spark.sql.execution.command.datamap.CarbonCreateDataMapCommand.processMetadata(CarbonCreateDataMapCommand.scala:118) > at > org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:90) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67) > at org.apache.spark.sql.Dataset.(Dataset.scala:183) > at > org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:108) > at > org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:97) > at
[jira] [Closed] (CARBONDATA-2528) MV Datamap - When the MV is created with the order by, then when we execute the corresponding query defined in MV with order by, then the data is not accessed from th
[ https://issues.apache.org/jira/browse/CARBONDATA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran closed CARBONDATA-2528. - Closed. > MV Datamap - When the MV is created with the order by, then when we execute > the corresponding query defined in MV with order by, then the data is not > accessed from the MV. > > > Key: CARBONDATA-2528 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2528 > Project: CarbonData > Issue Type: Bug > Components: data-query > Environment: 3 node Opensource ANT cluster. (Opensource Hadoop 2.7.2+ > Opensource Spark 2.2.1+ Opensource Carbondata 1.3.1) >Reporter: Prasanna Ravichandran >Assignee: Ravindra Pesala >Priority: Minor > Labels: CarbonData, MV, Materialistic_Views > Fix For: 1.5.0, 1.4.1 > > Attachments: MV_orderby.docx, data.csv > > Time Spent: 6h > Remaining Estimate: 0h > > When the MV is created with the order by condition, then when we execute the > corresponding query defined in MV along with order by, then the data is not > accessed from the MV. The data is being accessed from the maintable only. > Test queries: > create datamap MV_order using 'mv' as select > empno,sum(salary)+sum(utilization) as total from originTable group by empno > order by empno; > create datamap MV_desc_order using 'mv' as select > empno,sum(salary)+sum(utilization) as total from originTable group by empno > order by empno DESC; > rebuild datamap MV_order; > rebuild datamap MV_desc_order; > explain select empno,sum(salary)+sum(utilization) as total from originTable > group by empno order by empno; > explain select empno,sum(salary)+sum(utilization) as total from originTable > group by empno order by empno DESC; > Expected result: MV with order by condition should access data from the MV > table only. > > Please see the attached document for more details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (CARBONDATA-2540) MV Dataset - Unionall queries are not fetching data from MV dataset.
[ https://issues.apache.org/jira/browse/CARBONDATA-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran closed CARBONDATA-2540. - > MV Dataset - Unionall queries are not fetching data from MV dataset. > > > Key: CARBONDATA-2540 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2540 > Project: CarbonData > Issue Type: Bug > Components: data-query >Reporter: Prasanna Ravichandran >Assignee: Ravindra Pesala >Priority: Minor > Labels: Carbondata, MV, Materialistic_Views > Fix For: 1.5.0, 1.4.1 > > Attachments: data_mv.csv > > Time Spent: 5h 10m > Remaining Estimate: 0h > > Unionall queries are not fetching data from MV dataset. > Test queries: > scala> carbon.sql("drop table if exists fact_table1").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("CREATE TABLE fact_table1 (empno int, empname String, > designation String, doj Timestamp,workgroupcategory int, > workgroupcategoryname String, deptno int, deptname String,projectcode int, > projectjoindate Timestamp, projectenddate Timestamp,attendance > int,utilization int,salary int)STORED BY > 'org.apache.carbondata.format'").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("LOAD DATA local inpath > 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO TABLE fact_table1 > OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '"','timestampformat'='dd-MM-')").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("LOAD DATA local inpath > 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO TABLE fact_table1 > OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '\"','timestampformat'='dd-MM-')").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("drop table if exists fact_table2").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("CREATE TABLE fact_table2 (empno int, empname String, > designation String, doj Timestamp,workgroupcategory int, > workgroupcategoryname String, deptno int, deptname String,projectcode int, > projectjoindate Timestamp, projectenddate Timestamp,attendance > int,utilization int,salary int)STORED BY > 'org.apache.carbondata.format'").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("LOAD DATA local inpath > 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO TABLE fact_table2 > OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '\"','timestampformat'='dd-MM-')").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("LOAD DATA local inpath > 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO TABLE fact_table2 > OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '\"','timestampformat'='dd-MM-')").show(200,false) > ++ > || > ++ > ++ > > scala> carbon.sql("create datamap mv_unional using 'mv' as Select Z.empno > From (Select empno,empname From fact_table1 Union All Select empno,empname > from fact_table2) As Z Group By Z.empno").show(200,false) > ++ > || > ++ > ++ > > scala> carbon.sql("rebuild datamap mv_unional").show() > ++ > || > ++ > ++ > scala> carbon.sql("explain Select Z.empno From (Select empno,empname From > fact_table1 Union All Select empno,empname from fact_table2) As Z Group By > Z.empno").show(200,false) >
[jira] [Commented] (CARBONDATA-2540) MV Dataset - Unionall queries are not fetching data from MV dataset.
[ https://issues.apache.org/jira/browse/CARBONDATA-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565129#comment-16565129 ] Prasanna Ravichandran commented on CARBONDATA-2540: --- Validation added. Closed. *Terminal:* > create datamap mv_unional using 'mv' as Select Z.empno From (Select > empno,empname From fact_table1 Union All Select empno,empname from > fact_table2) As Z Group By Z.empno; *Error: java.lang.UnsupportedOperationException: MV is not supported for this query (state=,code=0)* > MV Dataset - Unionall queries are not fetching data from MV dataset. > > > Key: CARBONDATA-2540 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2540 > Project: CarbonData > Issue Type: Bug > Components: data-query >Reporter: Prasanna Ravichandran >Assignee: Ravindra Pesala >Priority: Minor > Labels: Carbondata, MV, Materialistic_Views > Fix For: 1.5.0, 1.4.1 > > Attachments: data_mv.csv > > Time Spent: 5h 10m > Remaining Estimate: 0h > > Unionall queries are not fetching data from MV dataset. > Test queries: > scala> carbon.sql("drop table if exists fact_table1").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("CREATE TABLE fact_table1 (empno int, empname String, > designation String, doj Timestamp,workgroupcategory int, > workgroupcategoryname String, deptno int, deptname String,projectcode int, > projectjoindate Timestamp, projectenddate Timestamp,attendance > int,utilization int,salary int)STORED BY > 'org.apache.carbondata.format'").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("LOAD DATA local inpath > 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO TABLE fact_table1 > OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '"','timestampformat'='dd-MM-')").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("LOAD DATA local inpath > 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO TABLE fact_table1 > OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '\"','timestampformat'='dd-MM-')").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("drop table if exists fact_table2").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("CREATE TABLE fact_table2 (empno int, empname String, > designation String, doj Timestamp,workgroupcategory int, > workgroupcategoryname String, deptno int, deptname String,projectcode int, > projectjoindate Timestamp, projectenddate Timestamp,attendance > int,utilization int,salary int)STORED BY > 'org.apache.carbondata.format'").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("LOAD DATA local inpath > 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO TABLE fact_table2 > OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '\"','timestampformat'='dd-MM-')").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("LOAD DATA local inpath > 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO TABLE fact_table2 > OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '\"','timestampformat'='dd-MM-')").show(200,false) > ++ > || > ++ > ++ > > scala> carbon.sql("create datamap mv_unional using 'mv' as Select Z.empno > From (Select empno,empname From fact_table1 Union All Select empno,empname > from fact_table2) As Z Group By Z.empno").show(200,false) > ++ > || > ++ > ++ > > scala> carbon.sql("rebuild datamap mv_unional").show() > ++ > || > ++ > ++ > scala> carbon.sql("explain Select Z.empno From (Select empno,empname From > fact_table1 Union All Select empno,empname from fact_table2) As Z Group By > Z.empno").show(200,false) >
[jira] [Commented] (CARBONDATA-2539) MV Dataset - Subqueries is not accessing the data from the MV datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565103#comment-16565103 ] Prasanna Ravichandran commented on CARBONDATA-2539: --- Still the sub-queries are not accessing the data from the MV datamap. Terminal: > create datamap dm3 using 'mv' as *select min(workgroupcategory) from > origintable*; +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.392 seconds) > select distinct workgroupcategory from originTable; ++--+ | workgroupcategory | ++--+ | 1 | | 3 | | 2 | ++--+ 3 rows selected (0.664 seconds) > select count(*) from originTable where workgroupcategory=1; +---+--+ | count(1) | +---+--+ | 5 | +---+--+ 1 row selected (0.349 seconds) > explain SELECT max(empno) FROM originTable WHERE workgroupcategory IN >(*select min(workgroupcategory) from originTable*) group by empname; +-+--+ | plan | +-+--+ | == CarbonData Profiler == Table Scan on origintable - total blocklets: 1 - filter: none - pruned by Main DataMap - skipped blocklets: 0 Table Scan on origintable - total blocklets: 1 - filter: none - pruned by Main DataMap - skipped blocklets: 0 | | == Physical Plan == *HashAggregate(keys=[empname#24982], functions=[max(empno#24981)]) +- Exchange hashpartitioning(empname#24982, 200) +- *HashAggregate(keys=[empname#24982], functions=[partial_max(empno#24981)]) +- *Project [empno#24981, empname#24982] +- *BroadcastHashJoin [workgroupcategory#24985], [*min(workgroupcategory)*#25804], LeftSemi, BuildRight :- *FileScan carbondata *rtyo.origintable*[empno#24981,empname#24982,designation#24983,doj#24984,workgroupcategory#24985,workgroupcategoryname#24986,deptno#24987,deptname#24988,projectcode#24989,projectjoindate#24990,projectenddate#24991,attendance#24992,utilization#24993,salary#24994] +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))) +- *HashAggregate(keys=[], functions=[min(workgroupcategory#24985)]) +- Exchange SinglePartition +- *HashAggregate(keys=[], functions=[partial_min(workgroupcategory#24985)]) +- *FileScan carbondata *rtyo.origintable*[workgroupcategory#24985] |
[jira] [Commented] (CARBONDATA-2534) MV Dataset - MV creation is not working with the substring()
[ https://issues.apache.org/jira/browse/CARBONDATA-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565016#comment-16565016 ] Prasanna Ravichandran commented on CARBONDATA-2534: --- Now the MV creation is working with the substring function without any error but when the user queries the MV query, it is not accessing the data from the MV datamap. *Terminal:* > create datamap mv_substr using 'mv' as select > sum(salary),substring(empname,2,5),designation from originTable group by > substring(empname,2,5),designation; +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.661 seconds) > explain select sum(salary),substring(empname,2,5),designation from > originTable group by substring(empname,2,5),designation; +--+--+ | plan | +--+--+ | == CarbonData Profiler == Table Scan on origintable - total blocklets: 2 - filter: none - pruned by Main DataMap - skipped blocklets: 0 | | == Physical Plan == *HashAggregate(keys=[substring(empname#18267, 2, 5)#18352, designation#18268], functions=[sum(cast(salary#18279 as bigint))]) +- Exchange hashpartitioning(substring(empname#18267, 2, 5)#18352, designation#18268, 200) +- *HashAggregate(keys=[substring(empname#18267, 2, 5) AS substring(empname#18267, 2, 5)#18352, designation#18268], functions=[partial_sum(cast(salary#18279 as bigint))]) +- *FileScan carbondata *b011.origintable*[empname#18267,designation#18268,salary#18279] | +--+--+ 2 rows selected (0.432 seconds) > MV Dataset - MV creation is not working with the substring() > - > > Key: CARBONDATA-2534 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2534 > Project: CarbonData > Issue Type: Bug > Components: data-query > Environment: 3 node opensource ANT cluster >Reporter: Prasanna Ravichandran >Priority: Minor > Labels: CarbonData, MV, Materialistic_Views > Fix For: 1.5.0, 1.4.1 > > Attachments: MV_substring.docx, data.csv > > Time Spent: 3h 10m > Remaining Estimate: 0h > > MV creation is not working with the sub string function. We are getting the > spark.sql.AnalysisException while trying to create a MV with the substring > and aggregate function. > *Spark -shell test queries:* > scala> carbon.sql("create datamap mv_substr using 'mv' as select > sum(salary),substring(empname,2,5),designation from originTable group by > substring(empname,2,5),designation").show(200,false) > *org.apache.spark.sql.AnalysisException: Cannot create a table having a > column whose name contains commas in Hive metastore. Table: > `default`.`mv_substr_table`; Column: substring_empname,_2,_5;* > *at* > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:150) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:148) > at scala.collection.immutable.List.foreach(List.scala:381) > at > org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema(HiveExternalCatalog.scala:148) > at >
[jira] [Commented] (CARBONDATA-2528) MV Datamap - When the MV is created with the order by, then when we execute the corresponding query defined in MV with order by, then the data is not accessed from
[ https://issues.apache.org/jira/browse/CARBONDATA-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565003#comment-16565003 ] Prasanna Ravichandran commented on CARBONDATA-2528: --- Now the data is fetching from the MV datamap for the order by queries. Working fine. explain select attendance,sum(salary)+sum(utilization) as total from originTable group by attendance order by attendance DESC; ++--+ | plan | ++--+ | == CarbonData Profiler == Table Scan on mv_desc_attendance_table - total blocklets: 4 - filter: none - pruned by Main DataMap - skipped blocklets: 0 | | == Physical Plan == *Sort [attendance#12952 DESC NULLS LAST], true, 0 +- Exchange rangepartitioning(attendance#12952 DESC NULLS LAST, 200) +- *Project [origintable_attendance#12897 AS attendance#12952, total#12898L] +- *FileScan carbondata b011.*mv_desc_attendance_table*[origintable_attendance#12897,total#12898L] | ++–+ explain select empno,sum(salary)+sum(utilization) as total from originTable group by empno order by empno; +-+--+ | plan | +-+--+ | == CarbonData Profiler == Table Scan on mv_order_table - total blocklets: 6 - filter: none - pruned by Main DataMap - skipped blocklets: 0 | | == Physical Plan == *Sort [empno#12822 ASC NULLS FIRST], true, 0 +- Exchange rangepartitioning(empno#12822 ASC NULLS FIRST, 200) +- *Project [origintable_empno#10724 AS empno#12822, total#10725L] +- *FileScan carbondata b011.mv_order_table[origintable_empno#10724,total#10725L] | +-+--+ > MV Datamap - When the MV is created with the order by, then when we execute > the corresponding query defined in MV with order by, then the data is not > accessed from the MV. > > > Key: CARBONDATA-2528 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2528 > Project: CarbonData > Issue Type: Bug > Components: data-query > Environment: 3 node Opensource ANT cluster. (Opensource Hadoop 2.7.2+ > Opensource Spark 2.2.1+ Opensource Carbondata 1.3.1) >Reporter: Prasanna Ravichandran >Assignee: Ravindra Pesala >Priority: Minor > Labels: CarbonData, MV, Materialistic_Views > Fix For: 1.5.0, 1.4.1 > > Attachments: MV_orderby.docx, data.csv > > Time Spent: 6h > Remaining Estimate: 0h > > When the MV is created with the order by condition, then when we execute the > corresponding query defined in MV along with order by, then the data is not > accessed from the MV. The data is being accessed from the maintable only. > Test queries: > create datamap MV_order using 'mv' as select > empno,sum(salary)+sum(utilization) as total from originTable group by empno > order by empno; > create datamap MV_desc_order using 'mv' as select > empno,sum(salary)+sum(utilization) as total from originTable group by empno > order by empno DESC; > rebuild
[jira] [Updated] (CARBONDATA-2731) Timeseries datamap queries should fetch data from the Timeseries datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-2731: -- Description: While creation of the timeseries datamap, queries to which it would apply would also be defined. SO when the user uses that same query after creation of TS datamap, then that query should fetch the data from the TS datamap created. Test queries: create table brinjal (imei string,AMSize string,channelsId string,ActiveCountry string, Activecity string,gamePointId double,deviceInformationId double,productionDate Timestamp,deliveryDate timestamp,deliverycharge double) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('table_blocksize'='1'); LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/vardhandaterestruct.csv' INTO TABLE brinjal OPTIONS('DELIMITER'=',', 'QUOTECHAR'= '"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge'); CREATE DATAMAP agg0_time ON TABLE brinjal USING 'timeSeries' DMPROPERTIES ('EVENT_TIME'='productionDate','SECOND_GRANULARITY'='1') AS SELECT productionDate, SUM(imei) FROM brinjal GROUP BY productionDate; explain SELECT productionDate, SUM(imei) FROM brinjal GROUP BY productionDate; 0: jdbc:hive2://10.18.98.136:23040/default> show datamap on table brinjal; +--+------ |DataMapName|ClassName|Associated Table|DataMap Properties| +--+------ |agg0_time|timeSeries|*rp.brinjal_agg0_time*|'event_time'='productionDate', 'second_granularity'='1'| |sensor|preaggregate|rp.brinjal_sensor| | +--+------ 2 rows selected (0.042 seconds) 0: jdbc:hive2://10.18.98.136:23040/default> explain SELECT productionDate, SUM(imei) FROM brinjal GROUP BY productionDate; +--++ |plan| +--++ |== CarbonData Profiler == Table Scan on brinjal - total blocklets: 1 - filter: none - pruned by Main DataMap - skipped blocklets: 0| |== Physical Plan == *HashAggregate(keys=[productionDate#155228|#155228], functions=[sum(cast(imei#155221 as double))|#155221 as double))]) +- Exchange hashpartitioning(productionDate#155228, 200) +- *HashAggregate(keys=[productionDate#155228|#155228], functions=[partial_sum(cast(imei#155221 as double))|#155221 as double))]) +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :rp, *Table name :brinjal,* Schema :Some(StructType(StructField(imei,StringType,true), StructField(amsize,StringType,true), StructField(channelsid,StringType,true), StructField(activecountry,StringType,true), StructField(activecity,StringType,true), StructField(gamepointid,DoubleType,true), StructField(deviceinformationid,DoubleType,true), StructField(productiondate,TimestampType,true), StructField(deliverydate,TimestampType,true), StructField(deliverycharge,DoubleType,true))) ]
[jira] [Updated] (CARBONDATA-2731) Timeseries datamap queries should fetch data from the Timeseries datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-2731: -- Description: While creation of the timeseries datamap, queries to which it would apply would also be defined. SO when the user uses that same query after creation of TS datamap, then that query should fetch the data from the TS datamap created. Test queries: create table brinjal (imei string,AMSize string,channelsId string,ActiveCountry string, Activecity string,gamePointId double,deviceInformationId double,productionDate Timestamp,deliveryDate timestamp,deliverycharge double) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('table_blocksize'='1'); LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/vardhandaterestruct.csv' INTO TABLE brinjal OPTIONS('DELIMITER'=',', 'QUOTECHAR'= '"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge'); CREATE DATAMAP agg0_time ON TABLE brinjal USING 'timeSeries' DMPROPERTIES ('EVENT_TIME'='productionDate','SECOND_GRANULARITY'='1') AS SELECT productionDate, SUM(imei) FROM brinjal GROUP BY productionDate; explain SELECT productionDate, SUM(imei) FROM brinjal GROUP BY productionDate; 0: jdbc:hive2://10.18.98.136:23040/default> show datamap on table brinjal; +-+--- |DataMapName|ClassName|Associated Table|DataMap Properties| +-+--- |agg0_time|timeSeries|*rp.brinjal_agg0_time*|'event_time'='productionDate', 'second_granularity'='1'| |sensor|preaggregate|rp.brinjal_sensor| | +-+--- 2 rows selected (0.042 seconds) 0: jdbc:hive2://10.18.98.136:23040/default> explain SELECT productionDate, SUM(imei) FROM brinjal GROUP BY productionDate; +--++ |plan| +--++ |== CarbonData Profiler == Table Scan on brinjal - total blocklets: 1 - filter: none - pruned by Main DataMap - skipped blocklets: 0| |== Physical Plan == *HashAggregate(keys=[productionDate#155228|#155228], functions=[sum(cast(imei#155221 as double))|#155221 as double))]) +- Exchange hashpartitioning(productionDate#155228, 200) +- *HashAggregate(keys=[productionDate#155228|#155228], functions=[partial_sum(cast(imei#155221 as double))|#155221 as double))]) +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :rp, *Table name :brinjal,* Schema :Some(StructType(StructField(imei,StringType,true), StructField(amsize,StringType,true), StructField(channelsid,StringType,true), StructField(activecountry,StringType,true), StructField(activecity,StringType,true), StructField(gamepointid,DoubleType,true), StructField(deviceinformationid,DoubleType,true), StructField(productiondate,TimestampType,true), StructField(deliverydate,TimestampType,true), StructField(deliverycharge,DoubleType,true))) ]
[jira] [Updated] (CARBONDATA-2731) Timeseries datamap queries should fetch data from the Timeseries datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-2731: -- Description: While creation of the timeseries datamap, queries to which it would apply would also be defined. SO when the user uses that same query after creation of TS datamap, then that query should fetch the data from the TS datamap created. Test queries: create table brinjal (imei string,AMSize string,channelsId string,ActiveCountry string, Activecity string,gamePointId double,deviceInformationId double,productionDate Timestamp,deliveryDate timestamp,deliverycharge double) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('table_blocksize'='1'); LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/vardhandaterestruct.csv' INTO TABLE brinjal OPTIONS('DELIMITER'=',', 'QUOTECHAR'= '"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge'); CREATE DATAMAP agg0_time ON TABLE brinjal USING 'timeSeries' DMPROPERTIES ('EVENT_TIME'='productionDate','SECOND_GRANULARITY'='1') AS SELECT productionDate, SUM(imei) FROM brinjal GROUP BY productionDate; explain SELECT productionDate, SUM(imei) FROM brinjal GROUP BY productionDate; 0: jdbc:hive2://10.18.98.136:23040/default> show datamap on table brinjal; ++-+++-+-- |DataMapName|ClassName|Associated Table|DataMap Properties| ++-+++-+-- |agg0_time|timeSeries|*rp.brinjal_agg0_time*|'event_time'='productionDate', 'second_granularity'='1'| |sensor|preaggregate|rp.brinjal_sensor| | ++-+++-+-- 2 rows selected (0.042 seconds) 0: jdbc:hive2://10.18.98.136:23040/default> explain SELECT productionDate, SUM(imei) FROM brinjal GROUP BY productionDate; +--++ |plan| +--++ |== CarbonData Profiler == Table Scan on brinjal - total blocklets: 1 - filter: none - pruned by Main DataMap - skipped blocklets: 0| |== Physical Plan == *HashAggregate(keys=[productionDate#155228|#155228], functions=[sum(cast(imei#155221 as double))|#155221 as double))]) +- Exchange hashpartitioning(productionDate#155228, 200) +- *HashAggregate(keys=[productionDate#155228|#155228], functions=[partial_sum(cast(imei#155221 as double))|#155221 as double))]) +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :rp, *Table name :brinjal,* Schema :Some(StructType(StructField(imei,StringType,true), StructField(amsize,StringType,true), StructField(channelsid,StringType,true), StructField(activecountry,StringType,true), StructField(activecity,StringType,true), StructField(gamepointid,DoubleType,true), StructField(deviceinformationid,DoubleType,true), StructField(productiondate,TimestampType,true), StructField(deliverydate,TimestampType,true), StructField(deliverycharge,DoubleType,true))) ]
[jira] [Updated] (CARBONDATA-2731) Timeseries datamap queries should fetch data from the Timeseries datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-2731: -- Description: While creation of the timeseries datamap, queries to which it would apply would also be defined. SO when the user uses that same query after creation of TS datamap, then that query should fetch the data from the TS datamap created. Test queries: create table brinjal (imei string,AMSize string,channelsId string,ActiveCountry string, Activecity string,gamePointId double,deviceInformationId double,productionDate Timestamp,deliveryDate timestamp,deliverycharge double) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('table_blocksize'='1'); LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/vardhandaterestruct.csv' INTO TABLE brinjal OPTIONS('DELIMITER'=',', 'QUOTECHAR'= '"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge'); CREATE DATAMAP agg0_time ON TABLE brinjal USING 'timeSeries' DMPROPERTIES ('EVENT_TIME'='productionDate','SECOND_GRANULARITY'='1') AS SELECT productionDate, SUM(imei) FROM brinjal GROUP BY productionDate; explain SELECT productionDate, SUM(imei) FROM brinjal GROUP BY productionDate; 0: jdbc:hive2://10.18.98.136:23040/default> show datamap on table brinjal; +---+--++-++-- |DataMapName|ClassName|Associated Table|DataMap Properties| +---+--++-++-- |agg0_time|timeSeries|*rp.brinjal_agg0_time*|'event_time'='productionDate', 'second_granularity'='1'| |sensor|preaggregate|rp.brinjal_sensor| | +---+--++-++-- 2 rows selected (0.042 seconds) 0: jdbc:hive2://10.18.98.136:23040/default> explain SELECT productionDate, SUM(imei) FROM brinjal GROUP BY productionDate; +-+-+ |plan| +-+-+ |== CarbonData Profiler == Table Scan on brinjal - total blocklets: 1 - filter: none - pruned by Main DataMap - skipped blocklets: 0| |== Physical Plan == *HashAggregate(keys=[productionDate#155228|#155228], functions=[sum(cast(imei#155221 as double))|#155221 as double))]) +- Exchange hashpartitioning(productionDate#155228, 200) +- *HashAggregate(keys=[productionDate#155228|#155228], functions=[partial_sum(cast(imei#155221 as double))|#155221 as double))]) +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :rp, *Table name :brinjal,* Schema :Some(StructType(StructField(imei,StringType,true), StructField(amsize,StringType,true), StructField(channelsid,StringType,true), StructField(activecountry,StringType,true), StructField(activecity,StringType,true), StructField(gamepointid,DoubleType,true), StructField(deviceinformationid,DoubleType,true), StructField(productiondate,TimestampType,true), StructField(deliverydate,TimestampType,true), StructField(deliverycharge,DoubleType,true))) ]
[jira] [Created] (CARBONDATA-2731) Timeseries datamap queries should fetch data from the Timeseries datamap.
Prasanna Ravichandran created CARBONDATA-2731: - Summary: Timeseries datamap queries should fetch data from the Timeseries datamap. Key: CARBONDATA-2731 URL: https://issues.apache.org/jira/browse/CARBONDATA-2731 Project: CarbonData Issue Type: Bug Components: data-query Affects Versions: 1.4.1 Environment: Spark 2.2 Reporter: Prasanna Ravichandran While creation of the timeseries datamap, queries to which it would apply would also be defined. SO when the user uses that same query after creation of TS datamap, then that query should fetch the data from the TS datamap created. Test queries: create table brinjal (imei string,AMSize string,channelsId string,ActiveCountry string, Activecity string,gamePointId double,deviceInformationId double,productionDate Timestamp,deliveryDate timestamp,deliverycharge double) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('table_blocksize'='1'); LOAD DATA INPATH 'hdfs://hacluster/user/prasanna/vardhandaterestruct.csv' INTO TABLE brinjal OPTIONS('DELIMITER'=',', 'QUOTECHAR'= '"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'= 'imei,deviceInformationId,AMSize,channelsId,ActiveCountry,Activecity,gamePointId,productionDate,deliveryDate,deliverycharge'); CREATE DATAMAP agg0_time ON TABLE brinjal USING 'timeSeries' DMPROPERTIES ('EVENT_TIME'='productionDate','SECOND_GRANULARITY'='1') AS SELECT productionDate, SUM(imei) FROM brinjal GROUP BY productionDate; explain SELECT productionDate, SUM(imei) FROM brinjal GROUP BY productionDate; !image-2018-07-11-18-06-15-260.png! 0: jdbc:hive2://10.18.98.136:23040/default> show datamap on table brinjal; +--+---+---+--+--+ | DataMapName | ClassName | Associated Table | DataMap Properties | +--+---+---+--+--+ | agg0_time | timeSeries | *rp.brinjal_agg0_time* | 'event_time'='productionDate', 'second_granularity'='1' | | sensor | preaggregate | rp.brinjal_sensor | | +--+---+---+--+--+ 2 rows selected (0.042 seconds) 0: jdbc:hive2://10.18.98.136:23040/default> explain SELECT productionDate, SUM(imei) FROM brinjal GROUP BY productionDate; ++--+ | plan | ++--+ | == CarbonData Profiler == Table Scan on brinjal - total blocklets: 1 - filter: none - pruned by Main DataMap - skipped blocklets: 0 | | == Physical Plan == *HashAggregate(keys=[productionDate#155228], functions=[sum(cast(imei#155221 as double))]) +- Exchange hashpartitioning(productionDate#155228, 200) +- *HashAggregate(keys=[productionDate#155228], functions=[partial_sum(cast(imei#155221 as double))]) +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :rp, *Table name :brinjal,* Schema :Some(StructType(StructField(imei,StringType,true), StructField(amsize,StringType,true), StructField(channelsid,StringType,true), StructField(activecountry,StringType,true),
[jira] [Closed] (CARBONDATA-2522) MV dataset when created with Joins, then it is not pointing towards the MV, while executing that join query.
[ https://issues.apache.org/jira/browse/CARBONDATA-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran closed CARBONDATA-2522. - Resolution: Invalid > MV dataset when created with Joins, then it is not pointing towards the MV, > while executing that join query. > > > Key: CARBONDATA-2522 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2522 > Project: CarbonData > Issue Type: Bug > Environment: 3 Node Opensource ANT Cluster. >Reporter: Prasanna Ravichandran >Priority: Minor > Labels: MV, Materialistic_Views > Attachments: MV_joins.docx, data_mv.csv, > image-2018-06-27-12-10-38-516.png > > > When MV is created on Joining tables, then the explain of that join query > points to the maintable, instead of the created MV datamap. > Queries: > drop table if exists fact_table1; > CREATE TABLE fact_table1 (empno int, empname String, designation String, doj > Timestamp, > workgroupcategory int, workgroupcategoryname String, deptno int, deptname > String, > projectcode int, projectjoindate Timestamp, projectenddate > Timestamp,attendance int, > utilization int,salary int) > STORED BY 'org.apache.carbondata.format'; > LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO > TABLE fact_table1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '"','timestampformat'='dd-MM-'); > LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO > TABLE fact_table1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '"','timestampformat'='dd-MM-'); > drop table if exists fact_table2; > CREATE TABLE fact_table2 (empno int, empname String, designation String, doj > Timestamp, > workgroupcategory int, workgroupcategoryname String, deptno int, deptname > String, > projectcode int, projectjoindate Timestamp, projectenddate > Timestamp,attendance int, > utilization int,salary int) > STORED BY 'org.apache.carbondata.format'; > LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO > TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '"','timestampformat'='dd-MM-'); > LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO > TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '"','timestampformat'='dd-MM-'); > drop table if exists fact_table3; > CREATE TABLE fact_table3 (empno int, empname String, designation String, doj > Timestamp, > workgroupcategory int, workgroupcategoryname String, deptno int, deptname > String, > projectcode int, projectjoindate Timestamp, projectenddate > Timestamp,attendance int, > utilization int,salary int) > STORED BY 'org.apache.carbondata.format'; > LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO > TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '"','timestampformat'='dd-MM-'); > LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO > TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '"','timestampformat'='dd-MM-'); > create datamap datamap25 using 'mv' as select t1.empname as c1, > t2.designation from fact_table1 t1,fact_table2 t2,fact_table3 t3 where > t1.empname = t2.empname and t1.empname=t3.empname; > explain create datamap datamap25 using 'mv' as select t1.empname as c1, > t2.designation from fact_table1 t1,fact_table2 t2,fact_table3 t3 where > t1.empname = t2.empname and t1.empname=t3.empname; > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CARBONDATA-2522) MV dataset when created with Joins, then it is not pointing towards the MV, while executing that join query.
[ https://issues.apache.org/jira/browse/CARBONDATA-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524656#comment-16524656 ] Prasanna Ravichandran commented on CARBONDATA-2522: --- !image-2018-06-27-12-10-38-516.png! Working fine after rebuilding the datamap. > MV dataset when created with Joins, then it is not pointing towards the MV, > while executing that join query. > > > Key: CARBONDATA-2522 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2522 > Project: CarbonData > Issue Type: Bug > Environment: 3 Node Opensource ANT Cluster. >Reporter: Prasanna Ravichandran >Priority: Minor > Labels: MV, Materialistic_Views > Attachments: MV_joins.docx, data_mv.csv, > image-2018-06-27-12-10-38-516.png > > > When MV is created on Joining tables, then the explain of that join query > points to the maintable, instead of the created MV datamap. > Queries: > drop table if exists fact_table1; > CREATE TABLE fact_table1 (empno int, empname String, designation String, doj > Timestamp, > workgroupcategory int, workgroupcategoryname String, deptno int, deptname > String, > projectcode int, projectjoindate Timestamp, projectenddate > Timestamp,attendance int, > utilization int,salary int) > STORED BY 'org.apache.carbondata.format'; > LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO > TABLE fact_table1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '"','timestampformat'='dd-MM-'); > LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO > TABLE fact_table1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '"','timestampformat'='dd-MM-'); > drop table if exists fact_table2; > CREATE TABLE fact_table2 (empno int, empname String, designation String, doj > Timestamp, > workgroupcategory int, workgroupcategoryname String, deptno int, deptname > String, > projectcode int, projectjoindate Timestamp, projectenddate > Timestamp,attendance int, > utilization int,salary int) > STORED BY 'org.apache.carbondata.format'; > LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO > TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '"','timestampformat'='dd-MM-'); > LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO > TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '"','timestampformat'='dd-MM-'); > drop table if exists fact_table3; > CREATE TABLE fact_table3 (empno int, empname String, designation String, doj > Timestamp, > workgroupcategory int, workgroupcategoryname String, deptno int, deptname > String, > projectcode int, projectjoindate Timestamp, projectenddate > Timestamp,attendance int, > utilization int,salary int) > STORED BY 'org.apache.carbondata.format'; > LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO > TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '"','timestampformat'='dd-MM-'); > LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO > TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '"','timestampformat'='dd-MM-'); > create datamap datamap25 using 'mv' as select t1.empname as c1, > t2.designation from fact_table1 t1,fact_table2 t2,fact_table3 t3 where > t1.empname = t2.empname and t1.empname=t3.empname; > explain create datamap datamap25 using 'mv' as select t1.empname as c1, > t2.designation from fact_table1 t1,fact_table2 t2,fact_table3 t3 where > t1.empname = t2.empname and t1.empname=t3.empname; > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2522) MV dataset when created with Joins, then it is not pointing towards the MV, while executing that join query.
[ https://issues.apache.org/jira/browse/CARBONDATA-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-2522: -- Attachment: image-2018-06-27-12-10-38-516.png > MV dataset when created with Joins, then it is not pointing towards the MV, > while executing that join query. > > > Key: CARBONDATA-2522 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2522 > Project: CarbonData > Issue Type: Bug > Environment: 3 Node Opensource ANT Cluster. >Reporter: Prasanna Ravichandran >Priority: Minor > Labels: MV, Materialistic_Views > Attachments: MV_joins.docx, data_mv.csv, > image-2018-06-27-12-10-38-516.png > > > When MV is created on Joining tables, then the explain of that join query > points to the maintable, instead of the created MV datamap. > Queries: > drop table if exists fact_table1; > CREATE TABLE fact_table1 (empno int, empname String, designation String, doj > Timestamp, > workgroupcategory int, workgroupcategoryname String, deptno int, deptname > String, > projectcode int, projectjoindate Timestamp, projectenddate > Timestamp,attendance int, > utilization int,salary int) > STORED BY 'org.apache.carbondata.format'; > LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO > TABLE fact_table1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '"','timestampformat'='dd-MM-'); > LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO > TABLE fact_table1 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '"','timestampformat'='dd-MM-'); > drop table if exists fact_table2; > CREATE TABLE fact_table2 (empno int, empname String, designation String, doj > Timestamp, > workgroupcategory int, workgroupcategoryname String, deptno int, deptname > String, > projectcode int, projectjoindate Timestamp, projectenddate > Timestamp,attendance int, > utilization int,salary int) > STORED BY 'org.apache.carbondata.format'; > LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO > TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '"','timestampformat'='dd-MM-'); > LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO > TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '"','timestampformat'='dd-MM-'); > drop table if exists fact_table3; > CREATE TABLE fact_table3 (empno int, empname String, designation String, doj > Timestamp, > workgroupcategory int, workgroupcategoryname String, deptno int, deptname > String, > projectcode int, projectjoindate Timestamp, projectenddate > Timestamp,attendance int, > utilization int,salary int) > STORED BY 'org.apache.carbondata.format'; > LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO > TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '"','timestampformat'='dd-MM-'); > LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data_mv.csv' INTO > TABLE fact_table2 OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '"','timestampformat'='dd-MM-'); > create datamap datamap25 using 'mv' as select t1.empname as c1, > t2.designation from fact_table1 t1,fact_table2 t2,fact_table3 t3 where > t1.empname = t2.empname and t1.empname=t3.empname; > explain create datamap datamap25 using 'mv' as select t1.empname as c1, > t2.designation from fact_table1 t1,fact_table2 t2,fact_table3 t3 where > t1.empname = t2.empname and t1.empname=t3.empname; > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (CARBONDATA-2537) MV Dataset - User queries with 'having' condition is not accessing the data from the MV datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran closed CARBONDATA-2537. - Resolution: Invalid User have to rebuild the datamap once, after that creation. So that it works fine. > MV Dataset - User queries with 'having' condition is not accessing the data > from the MV datamap. > > > Key: CARBONDATA-2537 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2537 > Project: CarbonData > Issue Type: Bug > Components: data-query > Environment: 3 Node Opensource ANT cluster. >Reporter: Prasanna Ravichandran >Assignee: xubo245 >Priority: Minor > Labels: Carbondata, MV, Materialistic_Views > Attachments: data.csv, image-2018-05-25-15-50-23-903.png, > image-2018-06-27-11-53-49-587.png, image-2018-06-27-11-54-31-158.png > > > User queries with 'having' condition is not accessing the data from the MV > datamap. It is accessing the data from the Main table. > Test queries - spark shell: > scala>carbon.sql("CREATE TABLE originTable (empno int, empname String, > designation String, doj Timestamp, workgroupcategory int, > workgroupcategoryname String, deptno int, deptname String, projectcode int, > projectjoindate Timestamp, projectenddate Timestamp,attendance int, > utilization int,salary int) STORED BY 'org.apache.carbondata.format'").show() > ++ > || > ++ > ++ > scala>carbon.sql("LOAD DATA local inpath > 'hdfs://hacluster/user/prasanna/data.csv' INTO TABLE originTable > OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '\"','timestampformat'='dd-MM-')").show() > ++ > || > ++ > ++ > scala> carbon.sql("select empno from originTable having > salary>1").show(200,false) > +-+ > |empno| > +-+ > |14 | > |15 | > |20 | > |19 | > +-+ > scala> carbon.sql("create datamap mv_hav using 'mv' as select empno from > originTable having salary>1").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("explain select empno from originTable having > salary>1").show(200,false) > +---+ > |plan | > +---+ > |== CarbonData Profiler == > Table Scan on origintable > - total blocklets: 1 > - filter: (salary <> null and salary > 1) > - pruned by Main DataMap > - skipped blocklets: 0 > | > |== Physical Plan == > *Project [empno#1131] > +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :default, > Table name :origintable, Schema > :Some(StructType(StructField(empno,IntegerType,true), > StructField(empname,StringType,true), > StructField(designation,StringType,true), > StructField(doj,TimestampType,true), > StructField(workgroupcategory,IntegerType,true), > StructField(workgroupcategoryname,StringType,true), > StructField(deptno,IntegerType,true), StructField(deptname,StringType,true), > StructField(projectcode,IntegerType,true), > StructField(projectjoindate,TimestampType,true), > StructField(projectenddate,TimestampType,true), > StructField(attendance,IntegerType,true), > StructField(utilization,IntegerType,true), >
[jira] [Comment Edited] (CARBONDATA-2537) MV Dataset - User queries with 'having' condition is not accessing the data from the MV datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524638#comment-16524638 ] Prasanna Ravichandran edited comment on CARBONDATA-2537 at 6/27/18 6:25 AM: User "HAVING" queries are accessing the data from the created MV datamap only. User have to rebuild the datamap once, after creation. Closed. !image-2018-06-27-11-54-31-158.png! was (Author: prasanna ravichandran): User queries are accessing the data from the created MV datamap. User have to rebuild the datamap once, after creation. Closed. !image-2018-06-27-11-54-31-158.png! > MV Dataset - User queries with 'having' condition is not accessing the data > from the MV datamap. > > > Key: CARBONDATA-2537 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2537 > Project: CarbonData > Issue Type: Bug > Components: data-query > Environment: 3 Node Opensource ANT cluster. >Reporter: Prasanna Ravichandran >Assignee: xubo245 >Priority: Minor > Labels: Carbondata, MV, Materialistic_Views > Attachments: data.csv, image-2018-05-25-15-50-23-903.png, > image-2018-06-27-11-53-49-587.png, image-2018-06-27-11-54-31-158.png > > > User queries with 'having' condition is not accessing the data from the MV > datamap. It is accessing the data from the Main table. > Test queries - spark shell: > scala>carbon.sql("CREATE TABLE originTable (empno int, empname String, > designation String, doj Timestamp, workgroupcategory int, > workgroupcategoryname String, deptno int, deptname String, projectcode int, > projectjoindate Timestamp, projectenddate Timestamp,attendance int, > utilization int,salary int) STORED BY 'org.apache.carbondata.format'").show() > ++ > || > ++ > ++ > scala>carbon.sql("LOAD DATA local inpath > 'hdfs://hacluster/user/prasanna/data.csv' INTO TABLE originTable > OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '\"','timestampformat'='dd-MM-')").show() > ++ > || > ++ > ++ > scala> carbon.sql("select empno from originTable having > salary>1").show(200,false) > +-+ > |empno| > +-+ > |14 | > |15 | > |20 | > |19 | > +-+ > scala> carbon.sql("create datamap mv_hav using 'mv' as select empno from > originTable having salary>1").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("explain select empno from originTable having > salary>1").show(200,false) > +---+ > |plan | > +---+ > |== CarbonData Profiler == > Table Scan on origintable > - total blocklets: 1 > - filter: (salary <> null and salary > 1) > - pruned by Main DataMap > - skipped blocklets: 0 > | > |== Physical Plan == > *Project [empno#1131] > +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :default, > Table name :origintable, Schema > :Some(StructType(StructField(empno,IntegerType,true), > StructField(empname,StringType,true), > StructField(designation,StringType,true), > StructField(doj,TimestampType,true), > StructField(workgroupcategory,IntegerType,true), >
[jira] [Commented] (CARBONDATA-2537) MV Dataset - User queries with 'having' condition is not accessing the data from the MV datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524638#comment-16524638 ] Prasanna Ravichandran commented on CARBONDATA-2537: --- User queries are accessing the data from the created MV datamap. User have to rebuild the datamap once, after creation. Closed. !image-2018-06-27-11-54-31-158.png! > MV Dataset - User queries with 'having' condition is not accessing the data > from the MV datamap. > > > Key: CARBONDATA-2537 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2537 > Project: CarbonData > Issue Type: Bug > Components: data-query > Environment: 3 Node Opensource ANT cluster. >Reporter: Prasanna Ravichandran >Assignee: xubo245 >Priority: Minor > Labels: Carbondata, MV, Materialistic_Views > Attachments: data.csv, image-2018-05-25-15-50-23-903.png, > image-2018-06-27-11-53-49-587.png, image-2018-06-27-11-54-31-158.png > > > User queries with 'having' condition is not accessing the data from the MV > datamap. It is accessing the data from the Main table. > Test queries - spark shell: > scala>carbon.sql("CREATE TABLE originTable (empno int, empname String, > designation String, doj Timestamp, workgroupcategory int, > workgroupcategoryname String, deptno int, deptname String, projectcode int, > projectjoindate Timestamp, projectenddate Timestamp,attendance int, > utilization int,salary int) STORED BY 'org.apache.carbondata.format'").show() > ++ > || > ++ > ++ > scala>carbon.sql("LOAD DATA local inpath > 'hdfs://hacluster/user/prasanna/data.csv' INTO TABLE originTable > OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '\"','timestampformat'='dd-MM-')").show() > ++ > || > ++ > ++ > scala> carbon.sql("select empno from originTable having > salary>1").show(200,false) > +-+ > |empno| > +-+ > |14 | > |15 | > |20 | > |19 | > +-+ > scala> carbon.sql("create datamap mv_hav using 'mv' as select empno from > originTable having salary>1").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("explain select empno from originTable having > salary>1").show(200,false) > +---+ > |plan | > +---+ > |== CarbonData Profiler == > Table Scan on origintable > - total blocklets: 1 > - filter: (salary <> null and salary > 1) > - pruned by Main DataMap > - skipped blocklets: 0 > | > |== Physical Plan == > *Project [empno#1131] > +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :default, > Table name :origintable, Schema > :Some(StructType(StructField(empno,IntegerType,true), > StructField(empname,StringType,true), > StructField(designation,StringType,true), > StructField(doj,TimestampType,true), > StructField(workgroupcategory,IntegerType,true), > StructField(workgroupcategoryname,StringType,true), > StructField(deptno,IntegerType,true), StructField(deptname,StringType,true), > StructField(projectcode,IntegerType,true), > StructField(projectjoindate,TimestampType,true), > StructField(projectenddate,TimestampType,true), >
[jira] [Updated] (CARBONDATA-2537) MV Dataset - User queries with 'having' condition is not accessing the data from the MV datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-2537: -- Attachment: image-2018-06-27-11-54-31-158.png > MV Dataset - User queries with 'having' condition is not accessing the data > from the MV datamap. > > > Key: CARBONDATA-2537 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2537 > Project: CarbonData > Issue Type: Bug > Components: data-query > Environment: 3 Node Opensource ANT cluster. >Reporter: Prasanna Ravichandran >Assignee: xubo245 >Priority: Minor > Labels: Carbondata, MV, Materialistic_Views > Attachments: data.csv, image-2018-05-25-15-50-23-903.png, > image-2018-06-27-11-53-49-587.png, image-2018-06-27-11-54-31-158.png > > > User queries with 'having' condition is not accessing the data from the MV > datamap. It is accessing the data from the Main table. > Test queries - spark shell: > scala>carbon.sql("CREATE TABLE originTable (empno int, empname String, > designation String, doj Timestamp, workgroupcategory int, > workgroupcategoryname String, deptno int, deptname String, projectcode int, > projectjoindate Timestamp, projectenddate Timestamp,attendance int, > utilization int,salary int) STORED BY 'org.apache.carbondata.format'").show() > ++ > || > ++ > ++ > scala>carbon.sql("LOAD DATA local inpath > 'hdfs://hacluster/user/prasanna/data.csv' INTO TABLE originTable > OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '\"','timestampformat'='dd-MM-')").show() > ++ > || > ++ > ++ > scala> carbon.sql("select empno from originTable having > salary>1").show(200,false) > +-+ > |empno| > +-+ > |14 | > |15 | > |20 | > |19 | > +-+ > scala> carbon.sql("create datamap mv_hav using 'mv' as select empno from > originTable having salary>1").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("explain select empno from originTable having > salary>1").show(200,false) > +---+ > |plan | > +---+ > |== CarbonData Profiler == > Table Scan on origintable > - total blocklets: 1 > - filter: (salary <> null and salary > 1) > - pruned by Main DataMap > - skipped blocklets: 0 > | > |== Physical Plan == > *Project [empno#1131] > +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :default, > Table name :origintable, Schema > :Some(StructType(StructField(empno,IntegerType,true), > StructField(empname,StringType,true), > StructField(designation,StringType,true), > StructField(doj,TimestampType,true), > StructField(workgroupcategory,IntegerType,true), > StructField(workgroupcategoryname,StringType,true), > StructField(deptno,IntegerType,true), StructField(deptname,StringType,true), > StructField(projectcode,IntegerType,true), > StructField(projectjoindate,TimestampType,true), > StructField(projectenddate,TimestampType,true), > StructField(attendance,IntegerType,true), > StructField(utilization,IntegerType,true), > StructField(salary,IntegerType,true))) ]
[jira] [Updated] (CARBONDATA-2537) MV Dataset - User queries with 'having' condition is not accessing the data from the MV datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-2537: -- Attachment: image-2018-06-27-11-53-49-587.png > MV Dataset - User queries with 'having' condition is not accessing the data > from the MV datamap. > > > Key: CARBONDATA-2537 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2537 > Project: CarbonData > Issue Type: Bug > Components: data-query > Environment: 3 Node Opensource ANT cluster. >Reporter: Prasanna Ravichandran >Assignee: xubo245 >Priority: Minor > Labels: Carbondata, MV, Materialistic_Views > Attachments: data.csv, image-2018-05-25-15-50-23-903.png, > image-2018-06-27-11-53-49-587.png > > > User queries with 'having' condition is not accessing the data from the MV > datamap. It is accessing the data from the Main table. > Test queries - spark shell: > scala>carbon.sql("CREATE TABLE originTable (empno int, empname String, > designation String, doj Timestamp, workgroupcategory int, > workgroupcategoryname String, deptno int, deptname String, projectcode int, > projectjoindate Timestamp, projectenddate Timestamp,attendance int, > utilization int,salary int) STORED BY 'org.apache.carbondata.format'").show() > ++ > || > ++ > ++ > scala>carbon.sql("LOAD DATA local inpath > 'hdfs://hacluster/user/prasanna/data.csv' INTO TABLE originTable > OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '\"','timestampformat'='dd-MM-')").show() > ++ > || > ++ > ++ > scala> carbon.sql("select empno from originTable having > salary>1").show(200,false) > +-+ > |empno| > +-+ > |14 | > |15 | > |20 | > |19 | > +-+ > scala> carbon.sql("create datamap mv_hav using 'mv' as select empno from > originTable having salary>1").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("explain select empno from originTable having > salary>1").show(200,false) > +---+ > |plan | > +---+ > |== CarbonData Profiler == > Table Scan on origintable > - total blocklets: 1 > - filter: (salary <> null and salary > 1) > - pruned by Main DataMap > - skipped blocklets: 0 > | > |== Physical Plan == > *Project [empno#1131] > +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :default, > Table name :origintable, Schema > :Some(StructType(StructField(empno,IntegerType,true), > StructField(empname,StringType,true), > StructField(designation,StringType,true), > StructField(doj,TimestampType,true), > StructField(workgroupcategory,IntegerType,true), > StructField(workgroupcategoryname,StringType,true), > StructField(deptno,IntegerType,true), StructField(deptname,StringType,true), > StructField(projectcode,IntegerType,true), > StructField(projectjoindate,TimestampType,true), > StructField(projectenddate,TimestampType,true), > StructField(attendance,IntegerType,true), > StructField(utilization,IntegerType,true), > StructField(salary,IntegerType,true))) ] default.origintable[empno#1131] > PushedFilters:
[jira] [Created] (CARBONDATA-2580) MV Datamap - Cannot create two MV`s with same name in different databases.
Prasanna Ravichandran created CARBONDATA-2580: - Summary: MV Datamap - Cannot create two MV`s with same name in different databases. Key: CARBONDATA-2580 URL: https://issues.apache.org/jira/browse/CARBONDATA-2580 Project: CarbonData Issue Type: Bug Components: data-load, data-query Environment: 3 Node Opensource ANT cluster Reporter: Prasanna Ravichandran Cannot create two MV`s with same name in different databases. If you create a MV datamap say MV1 in default database, then you could not use the same name(MV1) for defining another MV datamap in any other database. Test queries: scala> carbon.sql("create table ratish(id int, name string) stored by 'carbondata'").show(200,false) ++ || ++ ++ scala> carbon.sql("insert into ratish select 1,'ram'").show(200,false) ++ || ++ ++ scala> carbon.sql("insert into ratish select 2,'ravi'").show(200,false) ++ || ++ ++ scala> carbon.sql("insert into ratish select 3,'raghu'").show(200,false) ++ || ++ ++ scala> carbon.sql("create datamap radi using 'mv' as select name from ratish").show(200,false) ++ || ++ ++ scala> carbon.sql("rebuild datamap radi").show(200,false) ++ || ++ ++ scala> carbon.sql("explain select name from ratish").show(200,false) +--+ |plan | +--+ |== CarbonData Profiler == Table Scan on radi_table - total blocklets: 1 - filter: none - pruned by Main DataMap - skipped blocklets: 0 | |== Physical Plan == *Project [ratish_name#13790 AS name#13818] +- *BatchedScan CarbonDatasourceHadoopRelation [ Database name :default, Table name :radi_table, Schema :Some(StructType(StructField(ratish_name,StringType,true))) ] default.radi_table[ratish_name#13790]| +--+ scala> carbon.sql("create database rad").show(200,false) ++ || ++ ++ scala> carbon.sql("use rad").show(200,false) ++ || ++ ++ scala> carbon.sql("create table ratish(id int, name string) stored by 'carbondata'").show(200,false) ++ || ++ ++ scala> carbon.sql("insert into ratish select 1,'ram'").show(200,false) ++ || ++ ++ scala> carbon.sql("insert into ratish select 2,'ravi'").show(200,false) ++ || ++ ++ scala> carbon.sql("insert into ratish select 3,'raghu'").show(200,false) ++ || ++ ++ scala> carbon.sql("create datamap radi using 'mv' as select name from ratish").show(200,false) java.io.IOException: DataMap with name radi already exists in storage at org.apache.carbondata.core.metadata.schema.table.DiskBasedDMSchemaStorageProvider.saveSchema(DiskBasedDMSchemaStorageProvider.java:70) at org.apache.carbondata.core.datamap.DataMapStoreManager.saveDataMapSchema(DataMapStoreManager.java:158) at org.apache.carbondata.mv.datamap.MVHelper$.createMVDataMap(MVHelper.scala:115) at org.apache.carbondata.mv.datamap.MVDataMapProvider.initMeta(MVDataMapProvider.scala:53) at org.apache.spark.sql.execution.command.datamap.CarbonCreateDataMapCommand.processMetadata(CarbonCreateDataMapCommand.scala:118) at org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:90) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67) at org.apache.spark.sql.Dataset.(Dataset.scala:183) at org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:108) at org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:97) at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:155) at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:95) ... 48 elided -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-2576) MV Datamap - MV is not working fine if there is more than 3 aggregate function in the same datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-2576: -- Description: MV is not working fine if there is more than 3 aggregate function in the same datamap. It is working fine upto 3 aggregate functions on the same MV. Please see the attached document for more details. Test queries: scala> carbon.sql("create datamap datamap_comp_maxsumminavg using 'mv' as select empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) from originTable group by empno").show(200,false) ++ ++ ++ scala> carbon.sql("rebuild datamap datamap_comp_maxsumminavg").show(200,false) ++ ++ ++ scala> carbon.sql("explain select empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) from originTable group by empno").show(200,false) org.apache.spark.sql.AnalysisException: expression 'datamap_comp_maxsumminavg_table.`avg_attendance`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;; Aggregate [origintable_empno#2925|#2925], [origintable_empno#2925 AS empno#3002, max(max_projectenddate#2926) AS max(projectenddate)#3003, sum(sum_salary#2927L) AS sum(salary)#3004L, min(min_projectjoindate#2928) AS min(projectjoindate)#3005, avg_attendance#2929 AS avg(attendance)#3006|#2925 AS empno#3002, max(max_projectenddate#2926) AS max(projectenddate)#3003, sum(sum_salary#2927L) AS sum(salary)#3004L, min(min_projectjoindate#2928) AS min(projectjoindate)#3005, avg_attendance#2929 AS avg(attendance)#3006] +- SubqueryAlias datamap_comp_maxsumminavg_table +- Relation[origintable_empno#2925,max_projectenddate#2926,sum_salary#2927L,min_projectjoindate#2928,avg_attendance#2929|#2925,max_projectenddate#2926,sum_salary#2927L,min_projectjoindate#2928,avg_attendance#2929] CarbonDatasourceHadoopRelation [ Database name :default, Table name :datamap_comp_maxsumminavg_table, Schema :Some(StructType(StructField(origintable_empno,IntegerType,true), StructField(max_projectenddate,TimestampType,true), StructField(sum_salary,LongType,true), StructField(min_projectjoindate,TimestampType,true), StructField(avg_attendance,DoubleType,true))) ] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:39) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:247) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:253) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:280) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:78) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:78) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:52) at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:148) at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:95) at org.apache.carbondata.mv.datamap.MVAnalyzerRule.apply(MVAnalyzerRule.scala:72) at org.apache.carbondata.mv.datamap.MVAnalyzerRule.apply(MVAnalyzerRule.scala:38) at org.apache.spark.sql.hive.CarbonAnalyzer.execute(CarbonAnalyzer.scala:46) at org.apache.spark.sql.hive.CarbonAnalyzer.execute(CarbonAnalyzer.scala:27) at
[jira] [Updated] (CARBONDATA-2576) MV Datamap - MV is not working fine if there is more than 3 aggregate function in the same datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-2576: -- Attachment: data.csv > MV Datamap - MV is not working fine if there is more than 3 aggregate > function in the same datamap. > --- > > Key: CARBONDATA-2576 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2576 > Project: CarbonData > Issue Type: Bug > Components: data-query >Reporter: Prasanna Ravichandran >Priority: Minor > Labels: CARBONDATA., MV, Materialistic_Views > Attachments: From 4th aggregate function -error shown.docx, data.csv > > > MV is not working fine if there is more than 3 aggregate function in the same > datamap. > Test queries: > > scala> carbon.sql("create datamap datamap_comp_maxsumminavg using 'mv' as > select > empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) > from originTable group by empno").show(200,false) > ++ > || > ++ > ++ > > > scala> carbon.sql("rebuild datamap datamap_comp_maxsumminavg").show(200,false) > ++ > || > ++ > ++ > > > scala> carbon.sql("explain select > empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) > from originTable group by empno").show(200,false) > org.apache.spark.sql.AnalysisException: expression > 'datamap_comp_maxsumminavg_table.`avg_attendance`' is neither present in the > group by, nor is it an aggregate function. Add to group by or wrap in first() > (or first_value) if you don't care which value you get.;; > Aggregate [origintable_empno#2925], [origintable_empno#2925 AS empno#3002, > max(max_projectenddate#2926) AS max(projectenddate)#3003, > sum(sum_salary#2927L) AS sum(salary)#3004L, min(min_projectjoindate#2928) AS > min(projectjoindate)#3005, avg_attendance#2929 AS avg(attendance)#3006] > +- SubqueryAlias datamap_comp_maxsumminavg_table > +- > Relation[origintable_empno#2925,max_projectenddate#2926,sum_salary#2927L,min_projectjoindate#2928,avg_attendance#2929] > CarbonDatasourceHadoopRelation [ Database name :default, Table name > :datamap_comp_maxsumminavg_table, Schema > :Some(StructType(StructField(origintable_empno,IntegerType,true), > StructField(max_projectenddate,TimestampType,true), > StructField(sum_salary,LongType,true), > StructField(min_projectjoindate,TimestampType,true), > StructField(avg_attendance,DoubleType,true))) ] > > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:39) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:247) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253) > at scala.collection.immutable.List.foreach(List.scala:381) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:253) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280) > at scala.collection.immutable.List.foreach(List.scala:381) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:280) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:78) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:78) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:52) > at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:148) > at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:95) > at >
[jira] [Updated] (CARBONDATA-2576) MV Datamap - MV is not working fine if there is more than 3 aggregate function in the same datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanna Ravichandran updated CARBONDATA-2576: -- Description: MV is not working fine if there is more than 3 aggregate function in the same datamap. It is working fine upto 3 aggregate functions on the same MV. Test queries: scala> carbon.sql("create datamap datamap_comp_maxsumminavg using 'mv' as select empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) from originTable group by empno").show(200,false) ++ ++ ++ scala> carbon.sql("rebuild datamap datamap_comp_maxsumminavg").show(200,false) ++ ++ ++ scala> carbon.sql("explain select empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) from originTable group by empno").show(200,false) org.apache.spark.sql.AnalysisException: expression 'datamap_comp_maxsumminavg_table.`avg_attendance`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;; Aggregate [origintable_empno#2925|#2925], [origintable_empno#2925 AS empno#3002, max(max_projectenddate#2926) AS max(projectenddate)#3003, sum(sum_salary#2927L) AS sum(salary)#3004L, min(min_projectjoindate#2928) AS min(projectjoindate)#3005, avg_attendance#2929 AS avg(attendance)#3006|#2925 AS empno#3002, max(max_projectenddate#2926) AS max(projectenddate)#3003, sum(sum_salary#2927L) AS sum(salary)#3004L, min(min_projectjoindate#2928) AS min(projectjoindate)#3005, avg_attendance#2929 AS avg(attendance)#3006] +- SubqueryAlias datamap_comp_maxsumminavg_table +- Relation[origintable_empno#2925,max_projectenddate#2926,sum_salary#2927L,min_projectjoindate#2928,avg_attendance#2929|#2925,max_projectenddate#2926,sum_salary#2927L,min_projectjoindate#2928,avg_attendance#2929] CarbonDatasourceHadoopRelation [ Database name :default, Table name :datamap_comp_maxsumminavg_table, Schema :Some(StructType(StructField(origintable_empno,IntegerType,true), StructField(max_projectenddate,TimestampType,true), StructField(sum_salary,LongType,true), StructField(min_projectjoindate,TimestampType,true), StructField(avg_attendance,DoubleType,true))) ] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:39) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:247) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:253) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:280) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:78) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:78) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:52) at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:148) at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:95) at org.apache.carbondata.mv.datamap.MVAnalyzerRule.apply(MVAnalyzerRule.scala:72) at org.apache.carbondata.mv.datamap.MVAnalyzerRule.apply(MVAnalyzerRule.scala:38) at org.apache.spark.sql.hive.CarbonAnalyzer.execute(CarbonAnalyzer.scala:46) at org.apache.spark.sql.hive.CarbonAnalyzer.execute(CarbonAnalyzer.scala:27) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69) at
[jira] [Commented] (CARBONDATA-2576) MV Datamap - MV is not working fine if there is more than 3 aggregate function in the same datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500225#comment-16500225 ] Prasanna Ravichandran commented on CARBONDATA-2576: --- Please find the queries for the base table creation: CREATE TABLE originTable (empno int, empname String, designation String, doj Timestamp, workgroupcategory int, workgroupcategoryname String, deptno int, deptname String, projectcode int, projectjoindate Timestamp, projectenddate Timestamp,attendance int, utilization int,salary int) STORED BY 'org.apache.carbondata.format'; LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data.csv' INTO TABLE originTable OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= '"','timestampformat'='dd-MM-'); Also attached the data.csv. > MV Datamap - MV is not working fine if there is more than 3 aggregate > function in the same datamap. > --- > > Key: CARBONDATA-2576 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2576 > Project: CarbonData > Issue Type: Bug > Components: data-query >Reporter: Prasanna Ravichandran >Priority: Minor > Labels: CARBONDATA., MV, Materialistic_Views > Attachments: From 4th aggregate function -error shown.docx > > > MV is not working fine if there is more than 3 aggregate function in the same > datamap. > Test queries: > > scala> carbon.sql("create datamap datamap_comp_maxsumminavg using 'mv' as > select > empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) > from originTable group by empno").show(200,false) > ++ > || > ++ > ++ > > > scala> carbon.sql("rebuild datamap datamap_comp_maxsumminavg").show(200,false) > ++ > || > ++ > ++ > > > scala> carbon.sql("explain select > empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) > from originTable group by empno").show(200,false) > org.apache.spark.sql.AnalysisException: expression > 'datamap_comp_maxsumminavg_table.`avg_attendance`' is neither present in the > group by, nor is it an aggregate function. Add to group by or wrap in first() > (or first_value) if you don't care which value you get.;; > Aggregate [origintable_empno#2925], [origintable_empno#2925 AS empno#3002, > max(max_projectenddate#2926) AS max(projectenddate)#3003, > sum(sum_salary#2927L) AS sum(salary)#3004L, min(min_projectjoindate#2928) AS > min(projectjoindate)#3005, avg_attendance#2929 AS avg(attendance)#3006] > +- SubqueryAlias datamap_comp_maxsumminavg_table > +- > Relation[origintable_empno#2925,max_projectenddate#2926,sum_salary#2927L,min_projectjoindate#2928,avg_attendance#2929] > CarbonDatasourceHadoopRelation [ Database name :default, Table name > :datamap_comp_maxsumminavg_table, Schema > :Some(StructType(StructField(origintable_empno,IntegerType,true), > StructField(max_projectenddate,TimestampType,true), > StructField(sum_salary,LongType,true), > StructField(min_projectjoindate,TimestampType,true), > StructField(avg_attendance,DoubleType,true))) ] > > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:39) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:247) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253) > at scala.collection.immutable.List.foreach(List.scala:381) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:253) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280) > at scala.collection.immutable.List.foreach(List.scala:381) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:280) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:78) > at >
[jira] [Created] (CARBONDATA-2576) MV Datamap - MV is not working fine if there is more than 3 aggregate function in the same datamap.
Prasanna Ravichandran created CARBONDATA-2576: - Summary: MV Datamap - MV is not working fine if there is more than 3 aggregate function in the same datamap. Key: CARBONDATA-2576 URL: https://issues.apache.org/jira/browse/CARBONDATA-2576 Project: CarbonData Issue Type: Bug Components: data-query Reporter: Prasanna Ravichandran Attachments: From 4th aggregate function -error shown.docx MV is not working fine if there is more than 3 aggregate function in the same datamap. Test queries: scala> carbon.sql("create datamap datamap_comp_maxsumminavg using 'mv' as select empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) from originTable group by empno").show(200,false) ++ || ++ ++ scala> carbon.sql("rebuild datamap datamap_comp_maxsumminavg").show(200,false) ++ || ++ ++ scala> carbon.sql("explain select empno,max(projectenddate),sum(salary),min(projectjoindate),avg(attendance) from originTable group by empno").show(200,false) org.apache.spark.sql.AnalysisException: expression 'datamap_comp_maxsumminavg_table.`avg_attendance`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;; Aggregate [origintable_empno#2925], [origintable_empno#2925 AS empno#3002, max(max_projectenddate#2926) AS max(projectenddate)#3003, sum(sum_salary#2927L) AS sum(salary)#3004L, min(min_projectjoindate#2928) AS min(projectjoindate)#3005, avg_attendance#2929 AS avg(attendance)#3006] +- SubqueryAlias datamap_comp_maxsumminavg_table +- Relation[origintable_empno#2925,max_projectenddate#2926,sum_salary#2927L,min_projectjoindate#2928,avg_attendance#2929] CarbonDatasourceHadoopRelation [ Database name :default, Table name :datamap_comp_maxsumminavg_table, Schema :Some(StructType(StructField(origintable_empno,IntegerType,true), StructField(max_projectenddate,TimestampType,true), StructField(sum_salary,LongType,true), StructField(min_projectjoindate,TimestampType,true), StructField(avg_attendance,DoubleType,true))) ] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:39) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:247) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1$5.apply(CheckAnalysis.scala:253) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$class$$anonfun$$checkValidAggregateExpression$1(CheckAnalysis.scala:253) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$9.apply(CheckAnalysis.scala:280) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:280) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:78) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:78) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:52) at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:148) at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:95) at org.apache.carbondata.mv.datamap.MVAnalyzerRule.apply(MVAnalyzerRule.scala:72) at org.apache.carbondata.mv.datamap.MVAnalyzerRule.apply(MVAnalyzerRule.scala:38) at org.apache.spark.sql.hive.CarbonAnalyzer.execute(CarbonAnalyzer.scala:46) at org.apache.spark.sql.hive.CarbonAnalyzer.execute(CarbonAnalyzer.scala:27) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67) at
[jira] [Created] (CARBONDATA-2574) MV Datamap - MV is not working if there is aggregate function with group by and without any projections.
Prasanna Ravichandran created CARBONDATA-2574: - Summary: MV Datamap - MV is not working if there is aggregate function with group by and without any projections. Key: CARBONDATA-2574 URL: https://issues.apache.org/jira/browse/CARBONDATA-2574 Project: CarbonData Issue Type: Bug Components: data-query Environment: 3 Node Opensource ANT cluster. Reporter: Prasanna Ravichandran Attachments: MV_aggregate_without_projection_and_with_groupby.docx, data.csv User query is not fetching data from the MV datamap, if there is aggregate function with group by and without any projections. Test queries:(In Spark-shell) scala> carbon.sql("CREATE TABLE originTable (empno int, empname String, designation String, doj Timestamp,workgroupcategory int, workgroupcategoryname String, deptno int, deptname String,projectcode int, projectjoindate Timestamp, projectenddate Timestamp,attendance int,utilization int,salary int) STORED BY 'org.apache.carbondata.format'").show(200,false) ++ || ++ ++ scala> carbon.sql("LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data.csv' INTO TABLE originTable OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= '\"','timestampformat'='dd-MM-')").show(200,false) ++ || ++ ++ scala> carbon.sql("create datamap Mv_misscol using 'mv' as select sum(salary) from origintable group by empno").show(200,false) ++ || ++ ++ scala> carbon.sql("rebuild datamap Mv_misscol").show(200,false) ++ || ++ ++ scala> carbon.sql("explain select sum(salary) from origintable group by empno").show(200,false) +---+ |plan | +---+ |== CarbonData Profiler == Table Scan on origintable - total blocklets: 1 - filter: none - pruned by Main DataMap - skipped blocklets: 0