[jira] [Commented] (CARBONDATA-2539) MV Dataset - Subqueries is not accessing the data from the MV datamap.

2018-08-01 Thread Ravindra Pesala (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565596#comment-16565596
 ] 

Ravindra Pesala commented on CARBONDATA-2539:
-

After datamap creation you should rebuild datamap before accessing it.otherwise 
it will be disabled.

> MV Dataset - Subqueries is not accessing the data from the MV datamap.
> --
>
> Key: CARBONDATA-2539
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2539
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
> Environment: 3 node opensource ANT cluster.
>Reporter: Prasanna Ravichandran
>Assignee: Ravindra Pesala
>Priority: Minor
> Fix For: 1.5.0, 1.4.1
>
> Attachments: data.csv
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Inner subquery is not accessing the data from the MV datamap. It is accessing 
> the data from the main table.
> Test queries - Spark shell:
> scala> carbon.sql("drop table if exists origintable").show()
> ++
> ||
> ++
> ++
>  scala> carbon.sql("CREATE TABLE originTable (empno int, empname String, 
> designation String, doj Timestamp, workgroupcategory int, 
> workgroupcategoryname String, deptno int, deptname String, projectcode int, 
> projectjoindate Timestamp, projectenddate Timestamp,attendance int, 
> utilization int,salary int) STORED BY 
> 'org.apache.carbondata.format'").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("LOAD DATA local inpath 
> 'hdfs://hacluster/user/prasanna/data.csv' INTO TABLE originTable 
> OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= 
> '\"','timestampformat'='dd-MM-')").show(200,false)
> ++
> ||
> ++
> ++
>  
> scala> carbon.sql("drop datamap datamap_subqry").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("create datamap datamap_subqry using 'mv' as select 
> min(salary) from originTable group by empno").show(200,false)
> ++
> ||
> ++
> ++
> scala> carbon.sql("explain SELECT max(empno) FROM originTable WHERE salary IN 
> (select min(salary) from originTable group by empno ) group by 
> empname").show(200,false)
> ++
> |plan |
> 

[jira] [Commented] (CARBONDATA-2539) MV Dataset - Subqueries is not accessing the data from the MV datamap.

2018-08-01 Thread Prasanna Ravichandran (JIRA)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565103#comment-16565103
 ] 

Prasanna Ravichandran commented on CARBONDATA-2539:
---

Still the sub-queries are not accessing the data from the MV datamap.

Terminal:

> create datamap dm3 using 'mv' as *select min(workgroupcategory) from 
> origintable*;
+-+--+
| Result |
+-+--+
+-+--+
No rows selected (0.392 seconds)

> select distinct workgroupcategory from originTable;
++--+
| workgroupcategory |
++--+
| 1 |
| 3 |
| 2 |
++--+
3 rows selected (0.664 seconds)

> select count(*) from originTable where workgroupcategory=1;
+---+--+
| count(1) |
+---+--+
| 5 |
+---+--+
1 row selected (0.349 seconds)

> explain SELECT max(empno) FROM originTable WHERE workgroupcategory IN 
>(*select min(workgroupcategory) from originTable*) group by empname;
+-+--+
| plan |
+-+--+
| == CarbonData Profiler ==
Table Scan on origintable
 - total blocklets: 1
 - filter: none
 - pruned by Main DataMap
 - skipped blocklets: 0
Table Scan on origintable
 - total blocklets: 1
 - filter: none
 - pruned by Main DataMap
 - skipped blocklets: 0
 |
| == Physical Plan ==
*HashAggregate(keys=[empname#24982], functions=[max(empno#24981)])
+- Exchange hashpartitioning(empname#24982, 200)
 +- *HashAggregate(keys=[empname#24982], functions=[partial_max(empno#24981)])
 +- *Project [empno#24981, empname#24982]
 +- *BroadcastHashJoin [workgroupcategory#24985], 
[*min(workgroupcategory)*#25804], LeftSemi, BuildRight
 :- *FileScan carbondata 
*rtyo.origintable*[empno#24981,empname#24982,designation#24983,doj#24984,workgroupcategory#24985,workgroupcategoryname#24986,deptno#24987,deptname#24988,projectcode#24989,projectjoindate#24990,projectenddate#24991,attendance#24992,utilization#24993,salary#24994]
 +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] 
as bigint)))
 +- *HashAggregate(keys=[], functions=[min(workgroupcategory#24985)])
 +- Exchange SinglePartition
 +- *HashAggregate(keys=[], functions=[partial_min(workgroupcategory#24985)])
 +- *FileScan carbondata *rtyo.origintable*[workgroupcategory#24985] |