[jira] [Commented] (CARBONDATA-2539) MV Dataset - Subqueries is not accessing the data from the MV datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565596#comment-16565596 ] Ravindra Pesala commented on CARBONDATA-2539: - After datamap creation you should rebuild datamap before accessing it.otherwise it will be disabled. > MV Dataset - Subqueries is not accessing the data from the MV datamap. > -- > > Key: CARBONDATA-2539 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2539 > Project: CarbonData > Issue Type: Bug > Components: data-query > Environment: 3 node opensource ANT cluster. >Reporter: Prasanna Ravichandran >Assignee: Ravindra Pesala >Priority: Minor > Fix For: 1.5.0, 1.4.1 > > Attachments: data.csv > > Time Spent: 4h 20m > Remaining Estimate: 0h > > Inner subquery is not accessing the data from the MV datamap. It is accessing > the data from the main table. > Test queries - Spark shell: > scala> carbon.sql("drop table if exists origintable").show() > ++ > || > ++ > ++ > scala> carbon.sql("CREATE TABLE originTable (empno int, empname String, > designation String, doj Timestamp, workgroupcategory int, > workgroupcategoryname String, deptno int, deptname String, projectcode int, > projectjoindate Timestamp, projectenddate Timestamp,attendance int, > utilization int,salary int) STORED BY > 'org.apache.carbondata.format'").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("LOAD DATA local inpath > 'hdfs://hacluster/user/prasanna/data.csv' INTO TABLE originTable > OPTIONS('DELIMITER'= ',', 'QUOTECHAR'= > '\"','timestampformat'='dd-MM-')").show(200,false) > ++ > || > ++ > ++ > > scala> carbon.sql("drop datamap datamap_subqry").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("create datamap datamap_subqry using 'mv' as select > min(salary) from originTable group by empno").show(200,false) > ++ > || > ++ > ++ > scala> carbon.sql("explain SELECT max(empno) FROM originTable WHERE salary IN > (select min(salary) from originTable group by empno ) group by > empname").show(200,false) > ++ > |plan | >
[jira] [Commented] (CARBONDATA-2539) MV Dataset - Subqueries is not accessing the data from the MV datamap.
[ https://issues.apache.org/jira/browse/CARBONDATA-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565103#comment-16565103 ] Prasanna Ravichandran commented on CARBONDATA-2539: --- Still the sub-queries are not accessing the data from the MV datamap. Terminal: > create datamap dm3 using 'mv' as *select min(workgroupcategory) from > origintable*; +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.392 seconds) > select distinct workgroupcategory from originTable; ++--+ | workgroupcategory | ++--+ | 1 | | 3 | | 2 | ++--+ 3 rows selected (0.664 seconds) > select count(*) from originTable where workgroupcategory=1; +---+--+ | count(1) | +---+--+ | 5 | +---+--+ 1 row selected (0.349 seconds) > explain SELECT max(empno) FROM originTable WHERE workgroupcategory IN >(*select min(workgroupcategory) from originTable*) group by empname; +-+--+ | plan | +-+--+ | == CarbonData Profiler == Table Scan on origintable - total blocklets: 1 - filter: none - pruned by Main DataMap - skipped blocklets: 0 Table Scan on origintable - total blocklets: 1 - filter: none - pruned by Main DataMap - skipped blocklets: 0 | | == Physical Plan == *HashAggregate(keys=[empname#24982], functions=[max(empno#24981)]) +- Exchange hashpartitioning(empname#24982, 200) +- *HashAggregate(keys=[empname#24982], functions=[partial_max(empno#24981)]) +- *Project [empno#24981, empname#24982] +- *BroadcastHashJoin [workgroupcategory#24985], [*min(workgroupcategory)*#25804], LeftSemi, BuildRight :- *FileScan carbondata *rtyo.origintable*[empno#24981,empname#24982,designation#24983,doj#24984,workgroupcategory#24985,workgroupcategoryname#24986,deptno#24987,deptname#24988,projectcode#24989,projectjoindate#24990,projectenddate#24991,attendance#24992,utilization#24993,salary#24994] +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))) +- *HashAggregate(keys=[], functions=[min(workgroupcategory#24985)]) +- Exchange SinglePartition +- *HashAggregate(keys=[], functions=[partial_min(workgroupcategory#24985)]) +- *FileScan carbondata *rtyo.origintable*[workgroupcategory#24985] |