[jira] [Commented] (CARBONDATA-2534) MV Dataset - MV creation is not working with the substring()

2018-08-01 Thread Ravindra Pesala (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565578#comment-16565578
 ] 

Ravindra Pesala commented on CARBONDATA-2534:
-

After datamap creation you should rebuild datamap before accessing it.otherwise 
it will be disabled.

> MV Dataset - MV creation is not working with the substring() 
> -
>
> Key: CARBONDATA-2534
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2534
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
> Environment: 3 node opensource ANT cluster
>Reporter: Prasanna Ravichandran
>Priority: Minor
>  Labels: CarbonData, MV, Materialistic_Views
> Fix For: 1.5.0, 1.4.1
>
> Attachments: MV_substring.docx, data.csv
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> MV creation is not working with the sub string function. We are getting the 
> spark.sql.AnalysisException while trying to create a MV with the substring 
> and aggregate function. 
> *Spark -shell test queries:*
>  scala> carbon.sql("create datamap mv_substr using 'mv' as select 
> sum(salary),substring(empname,2,5),designation from originTable group by 
> substring(empname,2,5),designation").show(200,false)
> *org.apache.spark.sql.AnalysisException: Cannot create a table having a 
> column whose name contains commas in Hive metastore. Table: 
> `default`.`mv_substr_table`; Column: substring_empname,_2,_5;*
>  *at* 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:150)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:148)
>  at scala.collection.immutable.List.foreach(List.scala:381)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema(HiveExternalCatalog.scala:148)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply$mcV$sp(HiveExternalCatalog.scala:222)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.doCreateTable(HiveExternalCatalog.scala:216)
>  at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalog.createTable(ExternalCatalog.scala:110)
>  at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:316)
>  at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:119)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
>  at org.apache.spark.sql.Dataset.(Dataset.scala:183)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:108)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:97)
>  at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:155)
>  at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:95)
>  at 
> org.apache.spark.sql.execution.command.table.CarbonCreateTableCommand.processMetadata(CarbonCreateTableCommand.scala:126)
>  at 
> org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:68)
>  at 
> org.apache.carbondata.mv.datamap.MVHelper$.createMVDataMap(MVHelper.scala:103)
>  at 
> org.apache.carbondata.mv.datamap.MVDataMapProvider.initMeta(MVDataMapProvider.scala:53)
>  at 
> org.apache.spark.sql.execution.command.datamap.CarbonCreateDataMapCommand.processMetadata(CarbonCreateDataMapCommand.scala:118)
>  at 
> org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:90)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
>  at org.apache.spark.sql.Dataset.(Dataset.scala:183)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:108)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:97)
>  at 

[jira] [Commented] (CARBONDATA-2534) MV Dataset - MV creation is not working with the substring()

2018-08-01 Thread Prasanna Ravichandran (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565167#comment-16565167
 ] 

Prasanna Ravichandran commented on CARBONDATA-2534:
---

When the user executes the MV datamap query, it should be accessed from 
MV_Table.

> MV Dataset - MV creation is not working with the substring() 
> -
>
> Key: CARBONDATA-2534
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2534
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
> Environment: 3 node opensource ANT cluster
>Reporter: Prasanna Ravichandran
>Priority: Minor
>  Labels: CarbonData, MV, Materialistic_Views
> Fix For: 1.5.0, 1.4.1
>
> Attachments: MV_substring.docx, data.csv
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> MV creation is not working with the sub string function. We are getting the 
> spark.sql.AnalysisException while trying to create a MV with the substring 
> and aggregate function. 
> *Spark -shell test queries:*
>  scala> carbon.sql("create datamap mv_substr using 'mv' as select 
> sum(salary),substring(empname,2,5),designation from originTable group by 
> substring(empname,2,5),designation").show(200,false)
> *org.apache.spark.sql.AnalysisException: Cannot create a table having a 
> column whose name contains commas in Hive metastore. Table: 
> `default`.`mv_substr_table`; Column: substring_empname,_2,_5;*
>  *at* 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:150)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:148)
>  at scala.collection.immutable.List.foreach(List.scala:381)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema(HiveExternalCatalog.scala:148)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply$mcV$sp(HiveExternalCatalog.scala:222)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.doCreateTable(HiveExternalCatalog.scala:216)
>  at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalog.createTable(ExternalCatalog.scala:110)
>  at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:316)
>  at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:119)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
>  at org.apache.spark.sql.Dataset.(Dataset.scala:183)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:108)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:97)
>  at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:155)
>  at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:95)
>  at 
> org.apache.spark.sql.execution.command.table.CarbonCreateTableCommand.processMetadata(CarbonCreateTableCommand.scala:126)
>  at 
> org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:68)
>  at 
> org.apache.carbondata.mv.datamap.MVHelper$.createMVDataMap(MVHelper.scala:103)
>  at 
> org.apache.carbondata.mv.datamap.MVDataMapProvider.initMeta(MVDataMapProvider.scala:53)
>  at 
> org.apache.spark.sql.execution.command.datamap.CarbonCreateDataMapCommand.processMetadata(CarbonCreateDataMapCommand.scala:118)
>  at 
> org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:90)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
>  at org.apache.spark.sql.Dataset.(Dataset.scala:183)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:108)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:97)
>  at 

[jira] [Commented] (CARBONDATA-2534) MV Dataset - MV creation is not working with the substring()

2018-08-01 Thread Prasanna Ravichandran (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565016#comment-16565016
 ] 

Prasanna Ravichandran commented on CARBONDATA-2534:
---

Now the MV creation is working with the substring function without any error 
but when the user queries the MV query, it is not accessing the data from the 
MV datamap.

*Terminal:*

> create datamap mv_substr using 'mv' as select 
> sum(salary),substring(empname,2,5),designation from originTable group by 
> substring(empname,2,5),designation;
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (0.661 seconds)

> explain select sum(salary),substring(empname,2,5),designation from 
> originTable group by substring(empname,2,5),designation;
+--+--+
| plan |
+--+--+
| == CarbonData Profiler ==
Table Scan on origintable
 - total blocklets: 2
 - filter: none
 - pruned by Main DataMap
 - skipped blocklets: 0
 |
| == Physical Plan ==
*HashAggregate(keys=[substring(empname#18267, 2, 5)#18352, designation#18268], 
functions=[sum(cast(salary#18279 as bigint))])
+- Exchange hashpartitioning(substring(empname#18267, 2, 5)#18352, 
designation#18268, 200)
 +- *HashAggregate(keys=[substring(empname#18267, 2, 5) AS 
substring(empname#18267, 2, 5)#18352, designation#18268], 
functions=[partial_sum(cast(salary#18279 as bigint))])
 +- *FileScan carbondata 
*b011.origintable*[empname#18267,designation#18268,salary#18279] |
+--+--+
2 rows selected (0.432 seconds)

> MV Dataset - MV creation is not working with the substring() 
> -
>
> Key: CARBONDATA-2534
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2534
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
> Environment: 3 node opensource ANT cluster
>Reporter: Prasanna Ravichandran
>Priority: Minor
>  Labels: CarbonData, MV, Materialistic_Views
> Fix For: 1.5.0, 1.4.1
>
> Attachments: MV_substring.docx, data.csv
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> MV creation is not working with the sub string function. We are getting the 
> spark.sql.AnalysisException while trying to create a MV with the substring 
> and aggregate function. 
> *Spark -shell test queries:*
>  scala> carbon.sql("create datamap mv_substr using 'mv' as select 
> sum(salary),substring(empname,2,5),designation from originTable group by 
> substring(empname,2,5),designation").show(200,false)
> *org.apache.spark.sql.AnalysisException: Cannot create a table having a 
> column whose name contains commas in Hive metastore. Table: 
> `default`.`mv_substr_table`; Column: substring_empname,_2,_5;*
>  *at* 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:150)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:148)
>  at scala.collection.immutable.List.foreach(List.scala:381)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema(HiveExternalCatalog.scala:148)
>  at 
> 

[jira] [Commented] (CARBONDATA-2534) MV Dataset - MV creation is not working with the substring()

2018-06-04 Thread Prasanna Ravichandran (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500032#comment-16500032
 ] 

Prasanna Ravichandran commented on CARBONDATA-2534:
---

Base table queries:

CREATE TABLE originTable (empno int, empname String, designation String, doj 
Timestamp,
workgroupcategory int, workgroupcategoryname String, deptno int, deptname 
String,
projectcode int, projectjoindate Timestamp, projectenddate Timestamp,attendance 
int,
utilization int,salary int)
STORED BY 'org.apache.carbondata.format';

LOAD DATA local inpath 'hdfs://hacluster/user/prasanna/data.csv' INTO TABLE 
originTable OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
'"','timestampformat'='dd-MM-');

> MV Dataset - MV creation is not working with the substring() 
> -
>
> Key: CARBONDATA-2534
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2534
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
> Environment: 3 node opensource ANT cluster
>Reporter: Prasanna Ravichandran
>Priority: Minor
>  Labels: CarbonData, MV, Materialistic_Views
> Attachments: MV_substring.docx
>
>
> MV creation is not working with the sub string function. We are getting the 
> spark.sql.AnalysisException while trying to create a MV with the substring 
> and aggregate function. 
> *Spark -shell test queries:*
>  scala> carbon.sql("create datamap mv_substr using 'mv' as select 
> sum(salary),substring(empname,2,5),designation from originTable group by 
> substring(empname,2,5),designation").show(200,false)
> *org.apache.spark.sql.AnalysisException: Cannot create a table having a 
> column whose name contains commas in Hive metastore. Table: 
> `default`.`mv_substr_table`; Column: substring_empname,_2,_5;*
>  *at* 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:150)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema$2.apply(HiveExternalCatalog.scala:148)
>  at scala.collection.immutable.List.foreach(List.scala:381)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$verifyDataSchema(HiveExternalCatalog.scala:148)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply$mcV$sp(HiveExternalCatalog.scala:222)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.doCreateTable(HiveExternalCatalog.scala:216)
>  at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalog.createTable(ExternalCatalog.scala:110)
>  at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:316)
>  at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:119)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
>  at org.apache.spark.sql.Dataset.(Dataset.scala:183)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:108)
>  at 
> org.apache.spark.sql.CarbonSession$$anonfun$sql$1.apply(CarbonSession.scala:97)
>  at org.apache.spark.sql.CarbonSession.withProfiler(CarbonSession.scala:155)
>  at org.apache.spark.sql.CarbonSession.sql(CarbonSession.scala:95)
>  at 
> org.apache.spark.sql.execution.command.table.CarbonCreateTableCommand.processMetadata(CarbonCreateTableCommand.scala:126)
>  at 
> org.apache.spark.sql.execution.command.MetadataCommand.run(package.scala:68)
>  at 
> org.apache.carbondata.mv.datamap.MVHelper$.createMVDataMap(MVHelper.scala:103)
>  at 
> org.apache.carbondata.mv.datamap.MVDataMapProvider.initMeta(MVDataMapProvider.scala:53)
>  at 
> org.apache.spark.sql.execution.command.datamap.CarbonCreateDataMapCommand.processMetadata(CarbonCreateDataMapCommand.scala:118)
>  at 
> org.apache.spark.sql.execution.command.AtomicRunnableCommand.run(package.scala:90)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>  at 
>