JIRA is already created for the issue https://issues.apache.org/jira/browse/DRILL-3330 <https://issues.apache.org/jira/browse/DRILL-3330>
Thanks Rajkumar Singh MapR Technologies > On Jun 22, 2015, at 2:04 PM, Mustafa Engin Sözer > <[email protected]> wrote: > > Hi everyone, > > Thanks for the help. I was unable to read through the mails over the > weekend, so I just saw that the problem is confirmed here. Do you need > anything from me in order to pinpoint the problem? > > Is a JIRA ticket already created or should I create one myself? > > Thanks again. > > On 22 June 2015 at 07:06, Rajkumar Singh <[email protected]> wrote: > >> Hi Hao >> >> I have only one row in my sample data and find along the queries which I >> used to reproduce it. >> >> rsingh@Administrators-MacBook-Pro-4 ~/Downloads/apache-drill-1.0.0$ cat >> sample-data/master-data.csv >> 1,1,20.4.09,201,Orci Donec Nibh >> PC,1,[email protected],200,Dai >> Woodward,300,Hannah Sims,400,Abdul >> >> >> CREATE VIEW dfs.tmp.`fact_a` AS select distinct CAST(columns[0] >> as INTEGER) as b_id, CAST(columns[1] as INTEGER) as c_id, >> CAST(columns[2] as DATE) as a_date, CAST(columns[3] AS INTEGER) >> as a_value from >> dfs.root.`/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv` >> order by >> b_id, c_id, a_date; >> >> >> CREATE VIEW dfs.tmp.`dim_c` AS select CAST(columns[1] as >> INTEGER) as c_id, CAST(columns[12] as VARCHAR) as c_desc, >> CAST(columns[11] as INTEGER) as c4_id, CAST(columns[10] as >> VARCHAR) as c4_desc, CAST(columns[9] as INTEGER) as c3_id, >> CAST(columns[7] as INTEGER) as c2_id, CAST(columns[5] as >> INTEGER) as c1_id from >> dfs.root.`/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv` >> order >> by c_id, c4_id, c3_id, c2_id, c1_id; >> >> >> select a11.c_id c_id, >> max(a12.c_desc) c_desc, >> max(a12.c4_desc) c4_desc, >> sum(a11.a_value) a_value >> from dfs.tmp.`fact_a` a11 >> join dfs.tmp.`dim_c` a12 >> on (a11.c_id =a12.c_id) >> group by a11.c_id; >> >> >> >> Rajkumar Singh >> MapR Technologies >> >> >>> On Jun 21, 2015, at 1:34 AM, Hao Zhu <[email protected]> wrote: >>> >>> Nice. How many rows do you create? >>> Just curious what is the difference with my reproduce. >>> >>> Here are mine: >>> CREATE VIEW view_fact_account >>> AS select cast(columns[0] as int) account_id,cast(columns[1] as int) >>> costcenter_id, cast(columns[2] as date) account_date,cast(columns[3] as >>> int) account_value from dfs.drill.fact_account; >>> >>> >>> CREATE VIEW view_dim_costcenter AS >>> select cast(columns[0] as int) costcenter_id,cast(columns[1] as varchar) >>> costcenter_desc, cast(columns[2] as int) costcenter_name_id, >>> cast(columns[3] as varchar) costcenter_name_desc,cast(columns[4] as int) >>> department_id, cast(columns[5] as int) division_id, cast(columns[6] as >> int) >>> area_id from dfs.drill.dim_costcenter; >>> >>> Both tables only have 1 rows. >>> >>> 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> select * from >>> view_dim_costcenter ; >>> >> +----------------+------------------+---------------------+-----------------------+----------------+--------------+----------+ >>> | costcenter_id | costcenter_desc | costcenter_name_id | >>> costcenter_name_desc | department_id | division_id | area_id | >>> >> +----------------+------------------+---------------------+-----------------------+----------------+--------------+----------+ >>> | 2 | a | 2 | a >>> | 3 | 4 | 5 | >>> >> +----------------+------------------+---------------------+-----------------------+----------------+--------------+----------+ >>> 1 row selected (0.255 seconds) >>> 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> select * from >>> view_fact_account ; >>> +-------------+----------------+---------------+----------------+ >>> | account_id | costcenter_id | account_date | account_value | >>> +-------------+----------------+---------------+----------------+ >>> | 1 | 2 | 2015-01-01 | 3 | >>> +-------------+----------------+---------------+----------------+ >>> 1 row selected (0.141 seconds) >>> >>> Thanks, >>> Hao >>> >>> On Sat, Jun 20, 2015 at 9:31 AM, Rajkumar Singh <[email protected]> >> wrote: >>> >>>> Hi Hao >>>> >>>> I tried to reproduce the issue and able to repro it, I am running >>>> drill-1.0.0 in embedded mode with a small data set on my mac machine. >>>> >>>> for the bad sql I am getting calcite cannotplanexception (same as the >>>> error stack), for a good sql find below the explain plan. >>>> >>>> >>>> >> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+ >>>> | 00-00 Screen : rowType = RecordType(INTEGER c_id, VARCHAR(1) >> c_desc, >>>> VARCHAR(1) c4_desc, INTEGER a_value): rowcount = 1.0, cumulative cost = >>>> {15.1 rows, 129.1 cpu, 0.0 io, 0.0 network, 187.2 memory}, id = 1346 >>>> 00-01 Project(c_id=[$0], c_desc=[$1], c4_desc=[$2], a_value=[$3]) : >>>> rowType = RecordType(INTEGER c_id, VARCHAR(1) c_desc, VARCHAR(1) >> c4_desc, >>>> INTEGER a_value): rowcount = 1.0, cumulative cost = {15.0 rows, 129.0 >> cpu, >>>> 0.0 io, 0.0 network, 187.2 memory}, id = 1345 >>>> 00-02 HashAgg(group=[{0}], c_desc=[MAX($1)], c4_desc=[MAX($2)], >>>> a_value=[SUM($3)]) : rowType = RecordType(INTEGER c_id, VARCHAR(1) >> c_desc, >>>> VARCHAR(1) c4_desc, INTEGER a_value): rowcount = 1.0, cumulative cost = >>>> {15.0 rows, 129.0 cpu, 0.0 io, 0.0 network, 187.2 memory}, id = 1344 >>>> 00-03 Project(c_id=[$8], c_desc=[$1], c4_desc=[$3], >>>> a_value=[$10]) : rowType = RecordType(INTEGER c_id, VARCHAR(1) c_desc, >>>> VARCHAR(1) c4_desc, INTEGER a_value): rowcount = 1.0, cumulative cost = >>>> {14.0 rows, 85.0 cpu, 0.0 io, 0.0 network, 169.6 memory}, id = 1343 >>>> 00-04 HashJoin(condition=[=($8, $0)], joinType=[inner]) : >>>> rowType = RecordType(INTEGER c_id, VARCHAR(1) c_desc, INTEGER c4_id, >>>> VARCHAR(1) c4_desc, INTEGER c3_id, INTEGER c2_id, INTEGER c1_id, INTEGER >>>> b_id, INTEGER c_id0, DATE a_date, INTEGER a_value): rowcount = 1.0, >>>> cumulative cost = {14.0 rows, 85.0 cpu, 0.0 io, 0.0 network, 169.6 >> memory}, >>>> id = 1342 >>>> 00-05 Project(b_id=[$0], c_id0=[$1], a_date=[$2], >>>> a_value=[$3]) : rowType = RecordType(INTEGER b_id, INTEGER c_id0, DATE >>>> a_date, INTEGER a_value): rowcount = 1.0, cumulative cost = {8.0 rows, >> 35.0 >>>> cpu, 0.0 io, 0.0 network, 96.0 memory}, id = 1341 >>>> 00-07 SelectionVectorRemover : rowType = >> RecordType(INTEGER >>>> b_id, INTEGER c_id, DATE a_date, INTEGER a_value): rowcount = 1.0, >>>> cumulative cost = {8.0 rows, 35.0 cpu, 0.0 io, 0.0 network, 96.0 >> memory}, >>>> id = 1340 >>>> 00-09 Sort(sort0=[$0], sort1=[$1], sort2=[$2], >>>> dir0=[ASC], dir1=[ASC], dir2=[ASC]) : rowType = RecordType(INTEGER b_id, >>>> INTEGER c_id, DATE a_date, INTEGER a_value): rowcount = 1.0, cumulative >>>> cost = {7.0 rows, 34.0 cpu, 0.0 io, 0.0 network, 96.0 memory}, id = 1339 >>>> 00-11 SelectionVectorRemover : rowType = >>>> RecordType(INTEGER b_id, INTEGER c_id, DATE a_date, INTEGER a_value): >>>> rowcount = 1.0, cumulative cost = {6.0 rows, 34.0 cpu, 0.0 io, 0.0 >> network, >>>> 64.0 memory}, id = 1338 >>>> 00-13 Sort(sort0=[$0], sort1=[$1], sort2=[$2], >>>> dir0=[ASC], dir1=[ASC], dir2=[ASC]) : rowType = RecordType(INTEGER b_id, >>>> INTEGER c_id, DATE a_date, INTEGER a_value): rowcount = 1.0, cumulative >>>> cost = {5.0 rows, 33.0 cpu, 0.0 io, 0.0 network, 64.0 memory}, id = 1337 >>>> 00-14 StreamAgg(group=[{0, 1, 2, 3}]) : rowType = >>>> RecordType(INTEGER b_id, INTEGER c_id, DATE a_date, INTEGER a_value): >>>> rowcount = 1.0, cumulative cost = {4.0 rows, 33.0 cpu, 0.0 io, 0.0 >> network, >>>> 32.0 memory}, id = 1336 >>>> 00-15 Sort(sort0=[$0], sort1=[$1], sort2=[$2], >>>> sort3=[$3], dir0=[ASC], dir1=[ASC], dir2=[ASC], dir3=[ASC]) : rowType = >>>> RecordType(INTEGER b_id, INTEGER c_id, DATE a_date, INTEGER a_value): >>>> rowcount = 1.0, cumulative cost = {3.0 rows, 17.0 cpu, 0.0 io, 0.0 >> network, >>>> 32.0 memory}, id = 1335 >>>> 00-16 Project(b_id=[CAST(ITEM($0, >> 0)):INTEGER], >>>> c_id=[CAST(ITEM($0, 1)):INTEGER], a_date=[CAST(ITEM($0, 2)):DATE], >>>> a_value=[CAST(ITEM($0, 3)):INTEGER]) : rowType = RecordType(INTEGER >> b_id, >>>> INTEGER c_id, DATE a_date, INTEGER a_value): rowcount = 1.0, cumulative >>>> cost = {2.0 rows, 17.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1334 >>>> 00-17 Scan(groupscan=[EasyGroupScan >>>> >> [selectionRoot=/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv, >>>> numFiles=1, columns=[`columns`[0], `columns`[1], `columns`[2], >>>> `columns`[3]], >>>> >> files=[file:/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv]]]) >>>> : rowType = RecordType(ANY columns): rowcount = 1.0, cumulative cost = >> {1.0 >>>> rows, 1.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1333 >>>> 00-06 SelectionVectorRemover : rowType = RecordType(INTEGER >>>> c_id, VARCHAR(1) c_desc, INTEGER c4_id, VARCHAR(1) c4_desc, INTEGER >> c3_id, >>>> INTEGER c2_id, INTEGER c1_id): rowcount = 1.0, cumulative cost = {4.0 >> rows, >>>> 30.0 cpu, 0.0 io, 0.0 network, 56.0 memory}, id = 1332 >>>> 00-08 Sort(sort0=[$0], sort1=[$2], sort2=[$4], >> sort3=[$5], >>>> sort4=[$6], dir0=[ASC], dir1=[ASC], dir2=[ASC], dir3=[ASC], dir4=[ASC]) >> : >>>> rowType = RecordType(INTEGER c_id, VARCHAR(1) c_desc, INTEGER c4_id, >>>> VARCHAR(1) c4_desc, INTEGER c3_id, INTEGER c2_id, INTEGER c1_id): >> rowcount >>>> = 1.0, cumulative cost = {3.0 rows, 29.0 cpu, 0.0 io, 0.0 network, 56.0 >>>> memory}, id = 1331 >>>> 00-10 Project(c_id=[CAST(ITEM($0, 1)):INTEGER], >>>> c_desc=[CAST(ITEM($0, 12)):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE >>>> "ISO-8859-1$en_US$primary"], c4_id=[CAST(ITEM($0, 11)):INTEGER], >>>> c4_desc=[CAST(ITEM($0, 10)):VARCHAR(1) CHARACTER SET "ISO-8859-1" >> COLLATE >>>> "ISO-8859-1$en_US$primary"], c3_id=[CAST(ITEM($0, 9)):INTEGER], >>>> c2_id=[CAST(ITEM($0, 7)):INTEGER], c1_id=[CAST(ITEM($0, 5)):INTEGER]) : >>>> rowType = RecordType(INTEGER c_id, VARCHAR(1) c_desc, INTEGER c4_id, >>>> VARCHAR(1) c4_desc, INTEGER c3_id, INTEGER c2_id, INTEGER c1_id): >> rowcount >>>> = 1.0, cumulative cost = {2.0 rows, 29.0 cpu, 0.0 io, 0.0 network, 0.0 >>>> memory}, id = 1330 >>>> 00-12 Scan(groupscan=[EasyGroupScan >>>> >> [selectionRoot=/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv, >>>> numFiles=1, columns=[`columns`[1], `columns`[12], `columns`[11], >>>> `columns`[10], `columns`[9], `columns`[7], `columns`[5]], >>>> >> files=[file:/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv]]]) >>>> : rowType = RecordType(ANY columns): rowcount = 1.0, cumulative cost = >> {1.0 >>>> rows, 1.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1329 >>>> | { >>>> "head" : { >>>> "version" : 1, >>>> "generator" : { >>>> "type" : "ExplainHandler", >>>> "info" : "" >>>> }, >>>> "type" : "APACHE_DRILL_PHYSICAL", >>>> "options" : [ ], >>>> "queue" : 0, >>>> "resultMode" : "EXEC" >>>> }, >>>> "graph" : [ { >>>> "pop" : "fs-scan", >>>> "@id" : 12, >>>> "userName" : "rsingh", >>>> "files" : [ >>>> >> "file:/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv" >>>> ], >>>> "storage" : { >>>> "type" : "file", >>>> "enabled" : true, >>>> "connection" : "file:///", >>>> "workspaces" : { >>>> "root" : { >>>> "location" : "/", >>>> "writable" : false, >>>> "defaultInputFormat" : null >>>> }, >>>> "tmp" : { >>>> "location" : "/tmp", >>>> "writable" : true, >>>> "defaultInputFormat" : null >>>> } >>>> }, >>>> "formats" : { >>>> "psv" : { >>>> "type" : "text", >>>> "extensions" : [ "tbl" ], >>>> "delimiter" : "|" >>>> }, >>>> "csv" : { >>>> "type" : "text", >>>> "extensions" : [ "csv" ], >>>> "delimiter" : "," >>>> }, >>>> "tsv" : { >>>> "type" : "text", >>>> "extensions" : [ "tsv" ], >>>> "delimiter" : "\t" >>>> }, >>>> "parquet" : { >>>> "type" : "parquet" >>>> }, >>>> "json" : { >>>> "type" : "json" >>>> }, >>>> "avro" : { >>>> "type" : "avro" >>>> } >>>> } >>>> }, >>>> "format" : { >>>> "type" : "text", >>>> "extensions" : [ "csv" ], >>>> "delimiter" : "," >>>> }, >>>> "columns" : [ "`columns`[1]", "`columns`[12]", "`columns`[11]", >>>> "`columns`[10]", "`columns`[9]", "`columns`[7]", "`columns`[5]" ], >>>> "selectionRoot" : >>>> >> "/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv", >>>> "cost" : 1.0 >>>> }, { >>>> "pop" : "project", >>>> "@id" : 10, >>>> "exprs" : [ { >>>> "ref" : "`c_id`", >>>> "expr" : "cast( (`columns`[1] ) as INT )" >>>> }, { >>>> "ref" : "`c_desc`", >>>> "expr" : "cast( (`columns`[12] ) as VARCHAR(1) )" >>>> }, { >>>> "ref" : "`c4_id`", >>>> "expr" : "cast( (`columns`[11] ) as INT )" >>>> }, { >>>> "ref" : "`c4_desc`", >>>> "expr" : "cast( (`columns`[10] ) as VARCHAR(1) )" >>>> }, { >>>> "ref" : "`c3_id`", >>>> "expr" : "cast( (`columns`[9] ) as INT )" >>>> }, { >>>> "ref" : "`c2_id`", >>>> "expr" : "cast( (`columns`[7] ) as INT )" >>>> }, { >>>> "ref" : "`c1_id`", >>>> "expr" : "cast( (`columns`[5] ) as INT )" >>>> } ], >>>> "child" : 12, >>>> "initialAllocation" : 1000000, >>>> "maxAllocation" : 10000000000, >>>> "cost" : 1.0 >>>> }, { >>>> "pop" : "external-sort", >>>> "@id" : 8, >>>> "child" : 10, >>>> "orderings" : [ { >>>> "expr" : "`c_id`", >>>> "order" : "ASC", >>>> "nullDirection" : "UNSPECIFIED" >>>> }, { >>>> "expr" : "`c4_id`", >>>> "order" : "ASC", >>>> "nullDirection" : "UNSPECIFIED" >>>> }, { >>>> "expr" : "`c3_id`", >>>> "order" : "ASC", >>>> "nullDirection" : "UNSPECIFIED" >>>> }, { >>>> "expr" : "`c2_id`", >>>> "order" : "ASC", >>>> "nullDirection" : "UNSPECIFIED" >>>> }, { >>>> "expr" : "`c1_id`", >>>> "order" : "ASC", >>>> "nullDirection" : "UNSPECIFIED" >>>> } ], >>>> "reverse" : false, >>>> "initialAllocation" : 20000000, >>>> "maxAllocation" : 10000000000, >>>> "cost" : 1.0 >>>> }, { >>>> "pop" : "selection-vector-remover", >>>> "@id" : 6, >>>> "child" : 8, >>>> "initialAllocation" : 1000000, >>>> "maxAllocation" : 10000000000, >>>> "cost" : 1.0 >>>> }, { >>>> "pop" : "fs-scan", >>>> "@id" : 17, >>>> "userName" : "rsingh", >>>> "files" : [ >>>> >> "file:/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv" >>>> ], >>>> "storage" : { >>>> "type" : "file", >>>> "enabled" : true, >>>> "connection" : "file:///", >>>> "workspaces" : { >>>> "root" : { >>>> "location" : "/", >>>> "writable" : false, >>>> "defaultInputFormat" : null >>>> }, >>>> "tmp" : { >>>> "location" : "/tmp", >>>> "writable" : true, >>>> "defaultInputFormat" : null >>>> } >>>> }, >>>> "formats" : { >>>> "psv" : { >>>> "type" : "text", >>>> "extensions" : [ "tbl" ], >>>> "delimiter" : "|" >>>> }, >>>> "csv" : { >>>> "type" : "text", >>>> "extensions" : [ "csv" ], >>>> "delimiter" : "," >>>> }, >>>> "tsv" : { >>>> | >>>> +------------- >>>> >>>> >>>> Rajkumar Singh >>>> MapR Technologies >>>> >>>> >>>>> On Jun 20, 2015, at 9:07 PM, Hao Zhu <[email protected]> wrote: >>>>> >>>>> Hello, >>>>> >>>>> I tried to create the same data in my lab with Drill 1.0 on MapR 4.1, >>>>> however both SQL works fine in my end: >>>>> select a11.costcenter_id as costcenter_id, max(a12.costcenter_desc) as >>>>> costcenter_desc, max(a12.costcenter_name_desc) as costcenter_name_desc, >>>>> sum(a11.account_value) as sss from view_fact_account a11 >>>>> join view_dim_costcenter a12 on (a11.costcenter_id = >>>>> a12.costcenter_id) group by a11.costcenter_id; >>>>> +----------------+------------------+-----------------------+------+ >>>>> | costcenter_id | costcenter_desc | costcenter_name_desc | sss | >>>>> +----------------+------------------+-----------------------+------+ >>>>> | 2 | a | a | 3 | >>>>> +----------------+------------------+-----------------------+------+ >>>>> 1 row selected (0.302 seconds) >>>>> >>>>> select a11.costcenter_id as costcenter_id, max(a12.costcenter_desc) as >>>>> costcenter_desc, max(a12.costcenter_name_desc) as costcenter_name_desc, >>>>> sum(a11.account_value) as sss from view_dim_costcenter a12 >>>>> join view_fact_account a11 on (a11.costcenter_id = >>>>> a12.costcenter_id) group by a11.costcenter_id; >>>>> >>>>> +----------------+------------------+-----------------------+------+ >>>>> | costcenter_id | costcenter_desc | costcenter_name_desc | sss | >>>>> +----------------+------------------+-----------------------+------+ >>>>> | 2 | a | a | 3 | >>>>> +----------------+------------------+-----------------------+------+ >>>>> 1 row selected (0.209 seconds) >>>>> >>>>> To narrow down the issue, could you test something: >>>>> 1. Is this issue only happening with user2? Do you have the same issue >>>>> using user1 also? >>>>> Just want to confirm if this issue is related to impersonation or >>>>> permission. >>>>> >>>>> 2. Is this issue only happening with the 582 rows table? >>>>> I mean, if the 2 tables have fewer rows, can this issue reproduce? >>>>> In my test, I only created 1 row. >>>>> I just want to know if this issue is data driver or not. >>>>> >>>>> 3. Could you attach the good SQL and bad SQL profiles, so that the SQL >>>> plan >>>>> is more readable? >>>>> >>>>> Thanks, >>>>> Hao >>>>> >>>>> On Thu, Jun 18, 2015 at 6:46 AM, Mustafa Engin Sözer < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi everyone, >>>>>> >>>>>> I've had an earlier topic regarding this issue but no resolution came >>>> out >>>>>> of this and it couldn't be reproduced. Let me re-describe the issue >> and >>>> my >>>>>> cluster: >>>>>> >>>>>> Currently I have a 5-node Mapr cluster on AWS, including Drill. On >> both >>>>>> sides, the security is enabled and on drill, impersonation is also >>>> enabled. >>>>>> The only other configuration I changed in drill was the new views >>>>>> permissions which I set to 750. I'm using maprfs and our MapR version >> is >>>>>> 4.1.0 and Drill version is 1.0.0. >>>>>> >>>>>> So the process goes like this: >>>>>> >>>>>> I have two users involved in this process, called usr1 and usr2. usr1 >> is >>>>>> kind of an admin for the raw data whereas usr2 is not allowed to >> access >>>> to >>>>>> raw data. >>>>>> >>>>>> usr1 writes 3 csv files to /raw/costcenter volume and creates a >>>> relational >>>>>> model using drill views. These views are written to /views/costcenter >>>> where >>>>>> usr2 has access to. So usr2 can query these views without any issues. >>>>>> >>>>>> So there comes the problem. Along with several other tables, I have 2 >>>>>> views, namely fact_account and dim_costcenter (created out of the same >>>> csv) >>>>>> Here are the table definitions: >>>>>> >>>>>> describe dfs.views_costcenter.fact_account; >>>>>> +----------------+------------+--------------+ >>>>>> | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | >>>>>> +----------------+------------+--------------+ >>>>>> | account_id | INTEGER | YES | >>>>>> | costcenter_id | INTEGER | YES | >>>>>> | account_date | DATE | YES | >>>>>> | account_value | INTEGER | YES | >>>>>> +----------------+------------+--------------+ >>>>>> >>>>>> describe dfs.views_costcenter.dim_costcenter; >>>>>> +-----------------------+------------+--------------+ >>>>>> | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | >>>>>> +-----------------------+------------+--------------+ >>>>>> | costcenter_id | INTEGER | YES | >>>>>> | costcenter_desc | VARCHAR | YES | >>>>>> | costcenter_name_id | INTEGER | YES | >>>>>> | costcenter_name_desc | VARCHAR | YES | >>>>>> | department_id | INTEGER | YES | >>>>>> | division_id | INTEGER | YES | >>>>>> | area_id | INTEGER | YES | >>>>>> +-----------------------+------------+--------------+ >>>>>> >>>>>> Both tables have 582 rows. >>>>>> >>>>>> So I need to join these two tables and run some aggregations on them >> in >>>>>> order to create a report. I have the following query: >>>>>> >>>>>> select a11.costcenter_id as costcenter_id, max(a12.costcenter_desc) as >>>>>> costcenter_desc, max(a12.costcenter_name_desc) as >> costcenter_name_desc, >>>>>> sum(a11.account_value) as sss from dfs.views_costcenter.fact_account >> a11 >>>>>> join dfs.views_costcenter.dim_costcenter a12 on (a11.costcenter_id = >>>>>> a12.costcenter_id) group by a11.costcenter_id; >>>>>> >>>>>> When I run this query, the execution planner throws a huge exception >> as >>>> you >>>>>> can see below. However, I've found a strange solution to that. If I >>>>>> exchange the order of the tables within the join, ie. >>>>>> >>>>>> select a11.costcenter_id as costcenter_id, max(a12.costcenter_desc) as >>>>>> costcenter_desc, max(a12.costcenter_name_desc) as >> costcenter_name_desc, >>>>>> sum(a11.account_value) as sss from dfs.views_costcenter.dim_costcenter >>>> a12 >>>>>> join dfs.views_costcenter.fact_account a11 on (a11.costcenter_id = >>>>>> a12.costcenter_id) group by a11.costcenter_id; >>>>>> >>>>>> It works perfectly. So in summary, if I write t2 join t1 instead of t1 >>>> join >>>>>> t2 and change nothing else, it works like a charm. As inner join is >>>>>> commutative and associative, this was completely unexpected for me. >> Can >>>>>> someone confirm if this is a bug? I didn't want to file a bug to JIRA >>>>>> before asking you guys here first. >>>>>> >>>>>> Thanks in advance for your help. >>>>>> >>>>>> Below you can find the exception: (as the exception is huge, I've only >>>>>> posted part of it here. Please let me know if you need the complete >>>>>> exception) >>>>>> >>>>>> >>>>>> Error: SYSTEM ERROR: >>>>>> org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node >>>>>> [rel#15146:Subset#27.PHYSICAL.SINGLETON([]).[]] could not be >>>> implemented; >>>>>> planner state: >>>>>> >>>>>> Root: rel#15146:Subset#27.PHYSICAL.SINGLETON([]).[] >>>>>> Original rel: >>>>>> >>>> >> AbstractConverter(subset=[rel#15146:Subset#27.PHYSICAL.SINGLETON([]).[]], >>>>>> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], >>>>>> sort=[[]]): rowcount = 101.8, cumulative cost = {inf}, id = 15148 >>>>>> DrillScreenRel(subset=[rel#15145:Subset#27.LOGICAL.ANY([]).[]]): >>>> rowcount >>>>>> = 101.8, cumulative cost = {10.18 rows, 10.18 cpu, 0.0 io, 0.0 >> network, >>>> 0.0 >>>>>> memory}, id = 15144 >>>>>> DrillAggregateRel(subset=[rel#15143:Subset#26.LOGICAL.ANY([]).[]], >>>>>> group=[{0}], costcenter_desc=[MAX($1)], >> costcenter_name_desc=[MAX($2)], >>>>>> sss=[SUM($3)]): rowcount = 101.8, cumulative cost = {1.0 rows, 1.0 >> cpu, >>>> 0.0 >>>>>> io, 0.0 network, 0.0 memory}, id = 15142 >>>>>> DrillProjectRel(subset=[rel#15141:Subset#25.LOGICAL.ANY([]).[]], >>>>>> costcenter_id=[$1], costcenter_desc=[$5], costcenter_name_desc=[$7], >>>>>> account_value=[$3]): rowcount = 1018.0, cumulative cost = {0.0 rows, >> 0.0 >>>>>> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15140 >>>>>> DrillProjectRel(subset=[rel#15139:Subset#24.LOGICAL.ANY([]).[0, >>>> 2, >>>>>> 4, 5, 6]], account_id=[$7], costcenter_id=[$8], account_date=[$9], >>>>>> account_value=[$10], costcenter_id0=[$0], costcenter_desc=[$1], >>>>>> costcenter_name_id=[$2], costcenter_name_desc=[$3], >> department_id=[$4], >>>>>> division_id=[$5], area_id=[$6]): rowcount = 1018.0, cumulative cost = >>>> {0.0 >>>>>> rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15138 >>>>>> DrillJoinRel(subset=[rel#15137:Subset#23.LOGICAL.ANY([]).[0, >> 2, >>>>>> 4, 5, 6]], condition=[=($8, $0)], joinType=[inner]): rowcount = >> 1018.0, >>>>>> cumulative cost = {1119.8 rows, 13030.4 cpu, 0.0 io, 0.0 network, >>>> 1791.68 >>>>>> memory}, id = 15136 >>>>>> DrillSortRel(subset=[rel#15128:Subset#18.LOGICAL.ANY([]).[0, >>>> 2, >>>>>> 4, 5, 6]], sort0=[$0], sort1=[$2], sort2=[$4], sort3=[$5], sort4=[$6], >>>>>> dir0=[ASC], dir1=[ASC], dir2=[ASC], dir3=[ASC], dir4=[ASC]): rowcount >> = >>>>>> 1018.0, cumulative cost = {197407.16549843678 rows, 1018.0 cpu, 0.0 >> io, >>>> 0.0 >>>>>> network, 0.0 memory}, id = 15127 >>>>>> >>>>>> DrillProjectRel(subset=[rel#15126:Subset#17.LOGICAL.ANY([]).[]], >>>>>> costcenter_id=[CAST(ITEM($0, 1)):INTEGER], >>>> costcenter_desc=[CAST(ITEM($0, >>>>>> 12)):VARCHAR(45) CHARACTER SET "ISO-8859-1" COLLATE >>>>>> "ISO-8859-1$en_US$primary"], costcenter_name_id=[CAST(ITEM($0, >>>>>> 11)):INTEGER], costcenter_name_desc=[CAST(ITEM($0, 10)):VARCHAR(45) >>>>>> CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"], >>>>>> department_id=[CAST(ITEM($0, 9)):INTEGER], division_id=[CAST(ITEM($0, >>>>>> 7)):INTEGER], area_id=[CAST(ITEM($0, 5)):INTEGER]): rowcount = 1018.0, >>>>>> cumulative cost = {1018.0 rows, 28504.0 cpu, 0.0 io, 0.0 network, 0.0 >>>>>> memory}, id = 15125 >>>>>> >>>>>> DrillScanRel(subset=[rel#15124:Subset#16.LOGICAL.ANY([]).[]], >>>> table=[[dfs, >>>>>> raw_costcenter, master_datev*.csv]], groupscan=[EasyGroupScan >>>>>> [selectionRoot=/raw/costcenter/master_datev*.csv, numFiles=1, >>>>>> columns=[`columns`[1], `columns`[12], `columns`[11], `columns`[10], >>>>>> `columns`[9], `columns`[7], `columns`[5]], >>>>>> files=[maprfs:/raw/costcenter/master_datev_20150617.csv]]]): rowcount >> = >>>>>> 1018.0, cumulative cost = {1018.0 rows, 1018.0 cpu, 0.0 io, 0.0 >> network, >>>>>> 0.0 memory}, id = 15064 >>>>>> DrillSortRel(subset=[rel#15135:Subset#22.LOGICAL.ANY([]).[0, >>>> 1, >>>>>> 2]], sort0=[$0], sort1=[$1], sort2=[$2], dir0=[ASC], dir1=[ASC], >>>>>> dir2=[ASC]): rowcount = 101.8, cumulative cost = {7529.958857584828 >>>> rows, >>>>>> 101.8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15134 >>>>>> >>>>>> DrillAggregateRel(subset=[rel#15133:Subset#21.LOGICAL.ANY([]).[]], >>>>>> group=[{0, 1, 2, 3}]): rowcount = 101.8, cumulative cost = {1.0 rows, >>>> 1.0 >>>>>> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15132 >>>>>> >>>>>> DrillProjectRel(subset=[rel#15131:Subset#20.LOGICAL.ANY([]).[]], >>>>>> account_id=[CAST(ITEM($0, 0)):INTEGER], costcenter_id=[CAST(ITEM($0, >>>>>> 1)):INTEGER], account_date=[CAST(ITEM($0, 2)):DATE], >>>>>> account_value=[CAST(ITEM($0, 3)):INTEGER]): rowcount = 1018.0, >>>> cumulative >>>>>> cost = {1018.0 rows, 16288.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, >> id = >>>>>> 15130 >>>>>> >>>>>> DrillScanRel(subset=[rel#15129:Subset#19.LOGICAL.ANY([]).[]], >>>> table=[[dfs, >>>>>> raw_costcenter, master_datev*.csv]], groupscan=[EasyGroupScan >>>>>> [selectionRoot=/raw/costcenter/master_datev*.csv, numFiles=1, >>>>>> columns=[`columns`[0], `columns`[1], `columns`[2], `columns`[3]], >>>>>> files=[maprfs:/raw/costcenter/master_datev_20150617.csv]]]): rowcount >> = >>>>>> 1018.0, cumulative cost = {1018.0 rows, 1018.0 cpu, 0.0 io, 0.0 >> network, >>>>>> 0.0 memory}, id = 15074 >>>>>> >>>>>> Sets: >>>>>> Set#16, type: RecordType(ANY columns) >>>>>> rel#15124:Subset#16.LOGICAL.ANY([]).[], best=rel#15064, >>>>>> importance=0.4304672100000001 >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> *M. Engin Sözer* >>>>>> Junior Datawarehouse Manager >>>>>> [email protected] >>>>>> >>>>>> Goodgame Studios >>>>>> Theodorstr. 42-90, House 9 >>>>>> 22761 Hamburg, Germany >>>>>> Phone: +49 (0)40 219 880 -0 >>>>>> *www.goodgamestudios.com <http://www.goodgamestudios.com>* >>>>>> >>>>>> Goodgame Studios is a branch of Altigi GmbH >>>>>> Altigi GmbH, District court Hamburg, HRB 99869 >>>>>> Board of directors: Dr. Kai Wawrzinek, Dr. Christian Wawrzinek, Fabian >>>>>> Ritter >>>>>> >>>> >>>> >> >> > > > -- > > *M. Engin Sözer* > Junior Datawarehouse Manager > [email protected] > > Goodgame Studios > Theodorstr. 42-90, House 9 > 22761 Hamburg, Germany > Phone: +49 (0)40 219 880 -0 > *www.goodgamestudios.com <http://www.goodgamestudios.com>* > > Goodgame Studios is a branch of Altigi GmbH > Altigi GmbH, District court Hamburg, HRB 99869 > Board of directors: Dr. Kai Wawrzinek, Dr. Christian Wawrzinek, Fabian > Ritter
