Exception during Execution Planning for joins

Mustafa Engin Sözer Thu, 18 Jun 2015 06:47:32 -0700

Hi everyone,

I've had an earlier topic regarding this issue but no resolution came out
of this and it couldn't be reproduced. Let me re-describe the issue and my
cluster:


Currently I have a 5-node Mapr cluster on AWS, including Drill. On both
sides, the security is enabled and on drill, impersonation is also enabled.
The only other configuration I changed in drill was the new views
permissions which I set to 750. I'm using maprfs and our MapR version is
4.1.0 and Drill version is 1.0.0.

So the process goes like this:

I have two users involved in this process, called usr1 and usr2. usr1 is
kind of an admin for the raw data whereas usr2 is not allowed to access to
raw data.

usr1 writes 3 csv files to /raw/costcenter volume and creates a relational
model using drill views. These views are written to /views/costcenter where
usr2 has access to. So usr2 can query these views without any issues.

So there comes the problem. Along with several other tables, I have 2
views, namely fact_account and dim_costcenter (created out of the same csv)
Here are the table definitions:

describe dfs.views_costcenter.fact_account;
+----------------+------------+--------------+
|  COLUMN_NAME   | DATA_TYPE  | IS_NULLABLE  |
+----------------+------------+--------------+
| account_id     | INTEGER    | YES          |
| costcenter_id  | INTEGER    | YES          |
| account_date   | DATE       | YES          |
| account_value  | INTEGER    | YES          |
+----------------+------------+--------------+

describe dfs.views_costcenter.dim_costcenter;
+-----------------------+------------+--------------+
|      COLUMN_NAME      | DATA_TYPE  | IS_NULLABLE  |
+-----------------------+------------+--------------+
| costcenter_id         | INTEGER    | YES          |
| costcenter_desc       | VARCHAR    | YES          |
| costcenter_name_id    | INTEGER    | YES          |
| costcenter_name_desc  | VARCHAR    | YES          |
| department_id         | INTEGER    | YES          |
| division_id           | INTEGER    | YES          |
| area_id               | INTEGER    | YES          |
+-----------------------+------------+--------------+

Both tables have 582 rows.

So I need to join these two tables and run some aggregations on them in
order to create a report. I have the following query:

select a11.costcenter_id as costcenter_id, max(a12.costcenter_desc) as
costcenter_desc, max(a12.costcenter_name_desc) as costcenter_name_desc,
sum(a11.account_value) as sss from dfs.views_costcenter.fact_account a11
join dfs.views_costcenter.dim_costcenter a12 on (a11.costcenter_id =
a12.costcenter_id) group by a11.costcenter_id;

When I run this query, the execution planner throws a huge exception as you
can see below. However, I've found a strange solution to that. If I
exchange the order of the tables within the join, ie.

select a11.costcenter_id as costcenter_id, max(a12.costcenter_desc) as
costcenter_desc, max(a12.costcenter_name_desc) as costcenter_name_desc,
sum(a11.account_value) as sss from dfs.views_costcenter.dim_costcenter a12
join dfs.views_costcenter.fact_account a11 on (a11.costcenter_id =
a12.costcenter_id) group by a11.costcenter_id;

It works perfectly. So in summary, if I write t2 join t1 instead of t1 join
t2 and change nothing else, it works like a charm. As inner join is
commutative and associative, this was completely unexpected for me. Can
someone confirm if this is a bug? I didn't want to file a bug to JIRA
before asking you guys here first.

Thanks in advance for your help.

Below you can find the exception: (as the exception is huge, I've only
posted part of it here. Please let me know if you need the complete
exception)


Error: SYSTEM ERROR:
org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node
[rel#15146:Subset#27.PHYSICAL.SINGLETON([]).[]] could not be implemented;
planner state:

Root: rel#15146:Subset#27.PHYSICAL.SINGLETON([]).[]
Original rel:
AbstractConverter(subset=[rel#15146:Subset#27.PHYSICAL.SINGLETON([]).[]],
convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])],
sort=[[]]): rowcount = 101.8, cumulative cost = {inf}, id = 15148
  DrillScreenRel(subset=[rel#15145:Subset#27.LOGICAL.ANY([]).[]]): rowcount
= 101.8, cumulative cost = {10.18 rows, 10.18 cpu, 0.0 io, 0.0 network, 0.0
memory}, id = 15144
    DrillAggregateRel(subset=[rel#15143:Subset#26.LOGICAL.ANY([]).[]],
group=[{0}], costcenter_desc=[MAX($1)], costcenter_name_desc=[MAX($2)],
sss=[SUM($3)]): rowcount = 101.8, cumulative cost = {1.0 rows, 1.0 cpu, 0.0
io, 0.0 network, 0.0 memory}, id = 15142
      DrillProjectRel(subset=[rel#15141:Subset#25.LOGICAL.ANY([]).[]],
costcenter_id=[$1], costcenter_desc=[$5], costcenter_name_desc=[$7],
account_value=[$3]): rowcount = 1018.0, cumulative cost = {0.0 rows, 0.0
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15140
        DrillProjectRel(subset=[rel#15139:Subset#24.LOGICAL.ANY([]).[0, 2,
4, 5, 6]], account_id=[$7], costcenter_id=[$8], account_date=[$9],
account_value=[$10], costcenter_id0=[$0], costcenter_desc=[$1],
costcenter_name_id=[$2], costcenter_name_desc=[$3], department_id=[$4],
division_id=[$5], area_id=[$6]): rowcount = 1018.0, cumulative cost = {0.0
rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15138
          DrillJoinRel(subset=[rel#15137:Subset#23.LOGICAL.ANY([]).[0, 2,
4, 5, 6]], condition=[=($8, $0)], joinType=[inner]): rowcount = 1018.0,
cumulative cost = {1119.8 rows, 13030.4 cpu, 0.0 io, 0.0 network, 1791.68
memory}, id = 15136
            DrillSortRel(subset=[rel#15128:Subset#18.LOGICAL.ANY([]).[0, 2,
4, 5, 6]], sort0=[$0], sort1=[$2], sort2=[$4], sort3=[$5], sort4=[$6],
dir0=[ASC], dir1=[ASC], dir2=[ASC], dir3=[ASC], dir4=[ASC]): rowcount =
1018.0, cumulative cost = {197407.16549843678 rows, 1018.0 cpu, 0.0 io, 0.0
network, 0.0 memory}, id = 15127

DrillProjectRel(subset=[rel#15126:Subset#17.LOGICAL.ANY([]).[]],
costcenter_id=[CAST(ITEM($0, 1)):INTEGER], costcenter_desc=[CAST(ITEM($0,
12)):VARCHAR(45) CHARACTER SET "ISO-8859-1" COLLATE
"ISO-8859-1$en_US$primary"], costcenter_name_id=[CAST(ITEM($0,
11)):INTEGER], costcenter_name_desc=[CAST(ITEM($0, 10)):VARCHAR(45)
CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"],
department_id=[CAST(ITEM($0, 9)):INTEGER], division_id=[CAST(ITEM($0,
7)):INTEGER], area_id=[CAST(ITEM($0, 5)):INTEGER]): rowcount = 1018.0,
cumulative cost = {1018.0 rows, 28504.0 cpu, 0.0 io, 0.0 network, 0.0
memory}, id = 15125

DrillScanRel(subset=[rel#15124:Subset#16.LOGICAL.ANY([]).[]], table=[[dfs,
raw_costcenter, master_datev*.csv]], groupscan=[EasyGroupScan
[selectionRoot=/raw/costcenter/master_datev*.csv, numFiles=1,
columns=[`columns`[1], `columns`[12], `columns`[11], `columns`[10],
`columns`[9], `columns`[7], `columns`[5]],
files=[maprfs:/raw/costcenter/master_datev_20150617.csv]]]): rowcount =
1018.0, cumulative cost = {1018.0 rows, 1018.0 cpu, 0.0 io, 0.0 network,
0.0 memory}, id = 15064
            DrillSortRel(subset=[rel#15135:Subset#22.LOGICAL.ANY([]).[0, 1,
2]], sort0=[$0], sort1=[$1], sort2=[$2], dir0=[ASC], dir1=[ASC],
dir2=[ASC]): rowcount = 101.8, cumulative cost = {7529.958857584828 rows,
101.8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15134

DrillAggregateRel(subset=[rel#15133:Subset#21.LOGICAL.ANY([]).[]],
group=[{0, 1, 2, 3}]): rowcount = 101.8, cumulative cost = {1.0 rows, 1.0
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15132

DrillProjectRel(subset=[rel#15131:Subset#20.LOGICAL.ANY([]).[]],
account_id=[CAST(ITEM($0, 0)):INTEGER], costcenter_id=[CAST(ITEM($0,
1)):INTEGER], account_date=[CAST(ITEM($0, 2)):DATE],
account_value=[CAST(ITEM($0, 3)):INTEGER]): rowcount = 1018.0, cumulative
cost = {1018.0 rows, 16288.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id =
15130

DrillScanRel(subset=[rel#15129:Subset#19.LOGICAL.ANY([]).[]], table=[[dfs,
raw_costcenter, master_datev*.csv]], groupscan=[EasyGroupScan
[selectionRoot=/raw/costcenter/master_datev*.csv, numFiles=1,
columns=[`columns`[0], `columns`[1], `columns`[2], `columns`[3]],
files=[maprfs:/raw/costcenter/master_datev_20150617.csv]]]): rowcount =
1018.0, cumulative cost = {1018.0 rows, 1018.0 cpu, 0.0 io, 0.0 network,
0.0 memory}, id = 15074

Sets:
Set#16, type: RecordType(ANY columns)
rel#15124:Subset#16.LOGICAL.ANY([]).[], best=rel#15064,
importance=0.4304672100000001


-- 

*M. Engin Sözer*
Junior Datawarehouse Manager
[email protected]

Goodgame Studios
Theodorstr. 42-90, House 9
22761 Hamburg, Germany
Phone: +49 (0)40 219 880 -0
*www.goodgamestudios.com <http://www.goodgamestudios.com>*

Goodgame Studios is a branch of Altigi GmbH
Altigi GmbH, District court Hamburg, HRB 99869
Board of directors: Dr. Kai Wawrzinek, Dr. Christian Wawrzinek, Fabian
Ritter

Exception during Execution Planning for joins

Reply via email to