Re: Exception during Execution Planning for joins

Hao Zhu Sat, 20 Jun 2015 08:38:14 -0700

Hello,

I tried to create the same data in my lab with Drill 1.0 on MapR 4.1,
however both SQL works fine in my end:
select a11.costcenter_id as costcenter_id, max(a12.costcenter_desc) as
costcenter_desc, max(a12.costcenter_name_desc) as costcenter_name_desc,
sum(a11.account_value) as sss from view_fact_account a11
join view_dim_costcenter a12 on (a11.costcenter_id =
a12.costcenter_id) group by a11.costcenter_id;
+----------------+------------------+-----------------------+------+
| costcenter_id  | costcenter_desc  | costcenter_name_desc  | sss  |
+----------------+------------------+-----------------------+------+
| 2              | a                | a                     | 3    |
+----------------+------------------+-----------------------+------+
1 row selected (0.302 seconds)


select a11.costcenter_id as costcenter_id, max(a12.costcenter_desc) as
costcenter_desc, max(a12.costcenter_name_desc) as costcenter_name_desc,
sum(a11.account_value) as sss from view_dim_costcenter a12
join view_fact_account a11 on (a11.costcenter_id =
a12.costcenter_id) group by a11.costcenter_id;

+----------------+------------------+-----------------------+------+
| costcenter_id  | costcenter_desc  | costcenter_name_desc  | sss  |
+----------------+------------------+-----------------------+------+
| 2              | a                | a                     | 3    |
+----------------+------------------+-----------------------+------+
1 row selected (0.209 seconds)

To narrow down the issue, could you test something:
1. Is this issue only happening with user2? Do you have the same issue
using user1 also?
Just want to confirm if this issue is related to impersonation or
permission.

2. Is this issue only happening with the 582 rows table?
I mean, if the 2 tables have fewer rows, can this issue reproduce?
In my test, I only created 1 row.
I just want to know if this issue is data driver or not.

3. Could you attach the good SQL and bad SQL profiles, so that the SQL plan
is more readable?

Thanks,
Hao

On Thu, Jun 18, 2015 at 6:46 AM, Mustafa Engin Sözer <
[email protected]> wrote:

> Hi everyone,
>
> I've had an earlier topic regarding this issue but no resolution came out
> of this and it couldn't be reproduced. Let me re-describe the issue and my
> cluster:
>
> Currently I have a 5-node Mapr cluster on AWS, including Drill. On both
> sides, the security is enabled and on drill, impersonation is also enabled.
> The only other configuration I changed in drill was the new views
> permissions which I set to 750. I'm using maprfs and our MapR version is
> 4.1.0 and Drill version is 1.0.0.
>
> So the process goes like this:
>
> I have two users involved in this process, called usr1 and usr2. usr1 is
> kind of an admin for the raw data whereas usr2 is not allowed to access to
> raw data.
>
> usr1 writes 3 csv files to /raw/costcenter volume and creates a relational
> model using drill views. These views are written to /views/costcenter where
> usr2 has access to. So usr2 can query these views without any issues.
>
> So there comes the problem. Along with several other tables, I have 2
> views, namely fact_account and dim_costcenter (created out of the same csv)
> Here are the table definitions:
>
> describe dfs.views_costcenter.fact_account;
> +----------------+------------+--------------+
> |  COLUMN_NAME   | DATA_TYPE  | IS_NULLABLE  |
> +----------------+------------+--------------+
> | account_id     | INTEGER    | YES          |
> | costcenter_id  | INTEGER    | YES          |
> | account_date   | DATE       | YES          |
> | account_value  | INTEGER    | YES          |
> +----------------+------------+--------------+
>
> describe dfs.views_costcenter.dim_costcenter;
> +-----------------------+------------+--------------+
> |      COLUMN_NAME      | DATA_TYPE  | IS_NULLABLE  |
> +-----------------------+------------+--------------+
> | costcenter_id         | INTEGER    | YES          |
> | costcenter_desc       | VARCHAR    | YES          |
> | costcenter_name_id    | INTEGER    | YES          |
> | costcenter_name_desc  | VARCHAR    | YES          |
> | department_id         | INTEGER    | YES          |
> | division_id           | INTEGER    | YES          |
> | area_id               | INTEGER    | YES          |
> +-----------------------+------------+--------------+
>
> Both tables have 582 rows.
>
> So I need to join these two tables and run some aggregations on them in
> order to create a report. I have the following query:
>
> select a11.costcenter_id as costcenter_id, max(a12.costcenter_desc) as
> costcenter_desc, max(a12.costcenter_name_desc) as costcenter_name_desc,
> sum(a11.account_value) as sss from dfs.views_costcenter.fact_account a11
> join dfs.views_costcenter.dim_costcenter a12 on (a11.costcenter_id =
> a12.costcenter_id) group by a11.costcenter_id;
>
> When I run this query, the execution planner throws a huge exception as you
> can see below. However, I've found a strange solution to that. If I
> exchange the order of the tables within the join, ie.
>
> select a11.costcenter_id as costcenter_id, max(a12.costcenter_desc) as
> costcenter_desc, max(a12.costcenter_name_desc) as costcenter_name_desc,
> sum(a11.account_value) as sss from dfs.views_costcenter.dim_costcenter a12
> join dfs.views_costcenter.fact_account a11 on (a11.costcenter_id =
> a12.costcenter_id) group by a11.costcenter_id;
>
> It works perfectly. So in summary, if I write t2 join t1 instead of t1 join
> t2 and change nothing else, it works like a charm. As inner join is
> commutative and associative, this was completely unexpected for me. Can
> someone confirm if this is a bug? I didn't want to file a bug to JIRA
> before asking you guys here first.
>
> Thanks in advance for your help.
>
> Below you can find the exception: (as the exception is huge, I've only
> posted part of it here. Please let me know if you need the complete
> exception)
>
>
> Error: SYSTEM ERROR:
> org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node
> [rel#15146:Subset#27.PHYSICAL.SINGLETON([]).[]] could not be implemented;
> planner state:
>
> Root: rel#15146:Subset#27.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#15146:Subset#27.PHYSICAL.SINGLETON([]).[]],
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])],
> sort=[[]]): rowcount = 101.8, cumulative cost = {inf}, id = 15148
>   DrillScreenRel(subset=[rel#15145:Subset#27.LOGICAL.ANY([]).[]]): rowcount
> = 101.8, cumulative cost = {10.18 rows, 10.18 cpu, 0.0 io, 0.0 network, 0.0
> memory}, id = 15144
>     DrillAggregateRel(subset=[rel#15143:Subset#26.LOGICAL.ANY([]).[]],
> group=[{0}], costcenter_desc=[MAX($1)], costcenter_name_desc=[MAX($2)],
> sss=[SUM($3)]): rowcount = 101.8, cumulative cost = {1.0 rows, 1.0 cpu, 0.0
> io, 0.0 network, 0.0 memory}, id = 15142
>       DrillProjectRel(subset=[rel#15141:Subset#25.LOGICAL.ANY([]).[]],
> costcenter_id=[$1], costcenter_desc=[$5], costcenter_name_desc=[$7],
> account_value=[$3]): rowcount = 1018.0, cumulative cost = {0.0 rows, 0.0
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15140
>         DrillProjectRel(subset=[rel#15139:Subset#24.LOGICAL.ANY([]).[0, 2,
> 4, 5, 6]], account_id=[$7], costcenter_id=[$8], account_date=[$9],
> account_value=[$10], costcenter_id0=[$0], costcenter_desc=[$1],
> costcenter_name_id=[$2], costcenter_name_desc=[$3], department_id=[$4],
> division_id=[$5], area_id=[$6]): rowcount = 1018.0, cumulative cost = {0.0
> rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15138
>           DrillJoinRel(subset=[rel#15137:Subset#23.LOGICAL.ANY([]).[0, 2,
> 4, 5, 6]], condition=[=($8, $0)], joinType=[inner]): rowcount = 1018.0,
> cumulative cost = {1119.8 rows, 13030.4 cpu, 0.0 io, 0.0 network, 1791.68
> memory}, id = 15136
>             DrillSortRel(subset=[rel#15128:Subset#18.LOGICAL.ANY([]).[0, 2,
> 4, 5, 6]], sort0=[$0], sort1=[$2], sort2=[$4], sort3=[$5], sort4=[$6],
> dir0=[ASC], dir1=[ASC], dir2=[ASC], dir3=[ASC], dir4=[ASC]): rowcount =
> 1018.0, cumulative cost = {197407.16549843678 rows, 1018.0 cpu, 0.0 io, 0.0
> network, 0.0 memory}, id = 15127
>
> DrillProjectRel(subset=[rel#15126:Subset#17.LOGICAL.ANY([]).[]],
> costcenter_id=[CAST(ITEM($0, 1)):INTEGER], costcenter_desc=[CAST(ITEM($0,
> 12)):VARCHAR(45) CHARACTER SET "ISO-8859-1" COLLATE
> "ISO-8859-1$en_US$primary"], costcenter_name_id=[CAST(ITEM($0,
> 11)):INTEGER], costcenter_name_desc=[CAST(ITEM($0, 10)):VARCHAR(45)
> CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"],
> department_id=[CAST(ITEM($0, 9)):INTEGER], division_id=[CAST(ITEM($0,
> 7)):INTEGER], area_id=[CAST(ITEM($0, 5)):INTEGER]): rowcount = 1018.0,
> cumulative cost = {1018.0 rows, 28504.0 cpu, 0.0 io, 0.0 network, 0.0
> memory}, id = 15125
>
> DrillScanRel(subset=[rel#15124:Subset#16.LOGICAL.ANY([]).[]], table=[[dfs,
> raw_costcenter, master_datev*.csv]], groupscan=[EasyGroupScan
> [selectionRoot=/raw/costcenter/master_datev*.csv, numFiles=1,
> columns=[`columns`[1], `columns`[12], `columns`[11], `columns`[10],
> `columns`[9], `columns`[7], `columns`[5]],
> files=[maprfs:/raw/costcenter/master_datev_20150617.csv]]]): rowcount =
> 1018.0, cumulative cost = {1018.0 rows, 1018.0 cpu, 0.0 io, 0.0 network,
> 0.0 memory}, id = 15064
>             DrillSortRel(subset=[rel#15135:Subset#22.LOGICAL.ANY([]).[0, 1,
> 2]], sort0=[$0], sort1=[$1], sort2=[$2], dir0=[ASC], dir1=[ASC],
> dir2=[ASC]): rowcount = 101.8, cumulative cost = {7529.958857584828 rows,
> 101.8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15134
>
> DrillAggregateRel(subset=[rel#15133:Subset#21.LOGICAL.ANY([]).[]],
> group=[{0, 1, 2, 3}]): rowcount = 101.8, cumulative cost = {1.0 rows, 1.0
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15132
>
> DrillProjectRel(subset=[rel#15131:Subset#20.LOGICAL.ANY([]).[]],
> account_id=[CAST(ITEM($0, 0)):INTEGER], costcenter_id=[CAST(ITEM($0,
> 1)):INTEGER], account_date=[CAST(ITEM($0, 2)):DATE],
> account_value=[CAST(ITEM($0, 3)):INTEGER]): rowcount = 1018.0, cumulative
> cost = {1018.0 rows, 16288.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id =
> 15130
>
> DrillScanRel(subset=[rel#15129:Subset#19.LOGICAL.ANY([]).[]], table=[[dfs,
> raw_costcenter, master_datev*.csv]], groupscan=[EasyGroupScan
> [selectionRoot=/raw/costcenter/master_datev*.csv, numFiles=1,
> columns=[`columns`[0], `columns`[1], `columns`[2], `columns`[3]],
> files=[maprfs:/raw/costcenter/master_datev_20150617.csv]]]): rowcount =
> 1018.0, cumulative cost = {1018.0 rows, 1018.0 cpu, 0.0 io, 0.0 network,
> 0.0 memory}, id = 15074
>
> Sets:
> Set#16, type: RecordType(ANY columns)
> rel#15124:Subset#16.LOGICAL.ANY([]).[], best=rel#15064,
> importance=0.4304672100000001
>
>
> --
>
> *M. Engin Sözer*
> Junior Datawarehouse Manager
> [email protected]
>
> Goodgame Studios
> Theodorstr. 42-90, House 9
> 22761 Hamburg, Germany
> Phone: +49 (0)40 219 880 -0
> *www.goodgamestudios.com <http://www.goodgamestudios.com>*
>
> Goodgame Studios is a branch of Altigi GmbH
> Altigi GmbH, District court Hamburg, HRB 99869
> Board of directors: Dr. Kai Wawrzinek, Dr. Christian Wawrzinek, Fabian
> Ritter
>

Re: Exception during Execution Planning for joins

Reply via email to