Re: Exception during Execution Planning for joins

Rajkumar Singh Sun, 21 Jun 2015 22:07:34 -0700

Hi Hao

I have only one row in my sample data and find along the queries which I used 
to reproduce it.


rsingh@Administrators-MacBook-Pro-4 ~/Downloads/apache-drill-1.0.0$ cat 
sample-data/master-data.csv 
1,1,20.4.09,201,Orci Donec Nibh 
PC,1,[email protected],200,Dai Woodward,300,Hannah 
Sims,400,Abdul


        CREATE VIEW dfs.tmp.`fact_a` AS select distinct CAST(columns[0]
       as INTEGER) as b_id, CAST(columns[1] as INTEGER) as c_id,
       CAST(columns[2] as DATE) as a_date, CAST(columns[3] AS INTEGER)
       as a_value from 
dfs.root.`/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv`
 order by
       b_id, c_id, a_date;
       
       
        CREATE VIEW dfs.tmp.`dim_c` AS select CAST(columns[1] as
       INTEGER) as c_id, CAST(columns[12] as VARCHAR) as c_desc,
       CAST(columns[11] as INTEGER) as c4_id, CAST(columns[10] as
       VARCHAR) as c4_desc, CAST(columns[9] as INTEGER) as c3_id,
       CAST(columns[7] as INTEGER) as c2_id, CAST(columns[5] as
       INTEGER) as c1_id from 
dfs.root.`/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv`
 order
       by c_id, c4_id, c3_id, c2_id, c1_id;


        select a11.c_id  c_id,
        max(a12.c_desc)  c_desc,
        max(a12.c4_desc)  c4_desc,
        sum(a11.a_value)  a_value
        from dfs.tmp.`fact_a` a11
        join dfs.tmp.`dim_c` a12 
          on (a11.c_id =a12.c_id)
        group by a11.c_id;



Rajkumar Singh
MapR Technologies


> On Jun 21, 2015, at 1:34 AM, Hao Zhu <[email protected]> wrote:
> 
> Nice. How many rows do you create?
> Just curious what is the difference with my reproduce.
> 
> Here are mine:
> CREATE VIEW view_fact_account
> AS select cast(columns[0] as int) account_id,cast(columns[1] as int)
> costcenter_id, cast(columns[2] as date) account_date,cast(columns[3] as
> int) account_value from dfs.drill.fact_account;
> 
> 
> CREATE VIEW view_dim_costcenter AS
> select cast(columns[0] as int) costcenter_id,cast(columns[1] as varchar)
> costcenter_desc, cast(columns[2] as int) costcenter_name_id,
> cast(columns[3] as varchar) costcenter_name_desc,cast(columns[4] as int)
> department_id, cast(columns[5] as int) division_id, cast(columns[6] as int)
> area_id from dfs.drill.dim_costcenter;
> 
> Both tables only have 1 rows.
> 
> 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> select * from
> view_dim_costcenter ;
> +----------------+------------------+---------------------+-----------------------+----------------+--------------+----------+
> | costcenter_id  | costcenter_desc  | costcenter_name_id  |
> costcenter_name_desc  | department_id  | division_id  | area_id  |
> +----------------+------------------+---------------------+-----------------------+----------------+--------------+----------+
> | 2              | a                | 2                   | a
>      | 3              | 4            | 5        |
> +----------------+------------------+---------------------+-----------------------+----------------+--------------+----------+
> 1 row selected (0.255 seconds)
> 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> select * from
> view_fact_account ;
> +-------------+----------------+---------------+----------------+
> | account_id  | costcenter_id  | account_date  | account_value  |
> +-------------+----------------+---------------+----------------+
> | 1           | 2              | 2015-01-01    | 3              |
> +-------------+----------------+---------------+----------------+
> 1 row selected (0.141 seconds)
> 
> Thanks,
> Hao
> 
> On Sat, Jun 20, 2015 at 9:31 AM, Rajkumar Singh <[email protected]> wrote:
> 
>> Hi Hao
>> 
>> I tried to reproduce the issue and able to repro it, I am running
>> drill-1.0.0 in embedded mode with a small data set on my mac machine.
>> 
>> for the bad sql I am getting calcite cannotplanexception (same as the
>> error stack), for a good sql find below the explain plan.
>> 
>> 
>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
>> | 00-00    Screen : rowType = RecordType(INTEGER c_id, VARCHAR(1) c_desc,
>> VARCHAR(1) c4_desc, INTEGER a_value): rowcount = 1.0, cumulative cost =
>> {15.1 rows, 129.1 cpu, 0.0 io, 0.0 network, 187.2 memory}, id = 1346
>> 00-01      Project(c_id=[$0], c_desc=[$1], c4_desc=[$2], a_value=[$3]) :
>> rowType = RecordType(INTEGER c_id, VARCHAR(1) c_desc, VARCHAR(1) c4_desc,
>> INTEGER a_value): rowcount = 1.0, cumulative cost = {15.0 rows, 129.0 cpu,
>> 0.0 io, 0.0 network, 187.2 memory}, id = 1345
>> 00-02        HashAgg(group=[{0}], c_desc=[MAX($1)], c4_desc=[MAX($2)],
>> a_value=[SUM($3)]) : rowType = RecordType(INTEGER c_id, VARCHAR(1) c_desc,
>> VARCHAR(1) c4_desc, INTEGER a_value): rowcount = 1.0, cumulative cost =
>> {15.0 rows, 129.0 cpu, 0.0 io, 0.0 network, 187.2 memory}, id = 1344
>> 00-03          Project(c_id=[$8], c_desc=[$1], c4_desc=[$3],
>> a_value=[$10]) : rowType = RecordType(INTEGER c_id, VARCHAR(1) c_desc,
>> VARCHAR(1) c4_desc, INTEGER a_value): rowcount = 1.0, cumulative cost =
>> {14.0 rows, 85.0 cpu, 0.0 io, 0.0 network, 169.6 memory}, id = 1343
>> 00-04            HashJoin(condition=[=($8, $0)], joinType=[inner]) :
>> rowType = RecordType(INTEGER c_id, VARCHAR(1) c_desc, INTEGER c4_id,
>> VARCHAR(1) c4_desc, INTEGER c3_id, INTEGER c2_id, INTEGER c1_id, INTEGER
>> b_id, INTEGER c_id0, DATE a_date, INTEGER a_value): rowcount = 1.0,
>> cumulative cost = {14.0 rows, 85.0 cpu, 0.0 io, 0.0 network, 169.6 memory},
>> id = 1342
>> 00-05              Project(b_id=[$0], c_id0=[$1], a_date=[$2],
>> a_value=[$3]) : rowType = RecordType(INTEGER b_id, INTEGER c_id0, DATE
>> a_date, INTEGER a_value): rowcount = 1.0, cumulative cost = {8.0 rows, 35.0
>> cpu, 0.0 io, 0.0 network, 96.0 memory}, id = 1341
>> 00-07                SelectionVectorRemover : rowType = RecordType(INTEGER
>> b_id, INTEGER c_id, DATE a_date, INTEGER a_value): rowcount = 1.0,
>> cumulative cost = {8.0 rows, 35.0 cpu, 0.0 io, 0.0 network, 96.0 memory},
>> id = 1340
>> 00-09                  Sort(sort0=[$0], sort1=[$1], sort2=[$2],
>> dir0=[ASC], dir1=[ASC], dir2=[ASC]) : rowType = RecordType(INTEGER b_id,
>> INTEGER c_id, DATE a_date, INTEGER a_value): rowcount = 1.0, cumulative
>> cost = {7.0 rows, 34.0 cpu, 0.0 io, 0.0 network, 96.0 memory}, id = 1339
>> 00-11                    SelectionVectorRemover : rowType =
>> RecordType(INTEGER b_id, INTEGER c_id, DATE a_date, INTEGER a_value):
>> rowcount = 1.0, cumulative cost = {6.0 rows, 34.0 cpu, 0.0 io, 0.0 network,
>> 64.0 memory}, id = 1338
>> 00-13                      Sort(sort0=[$0], sort1=[$1], sort2=[$2],
>> dir0=[ASC], dir1=[ASC], dir2=[ASC]) : rowType = RecordType(INTEGER b_id,
>> INTEGER c_id, DATE a_date, INTEGER a_value): rowcount = 1.0, cumulative
>> cost = {5.0 rows, 33.0 cpu, 0.0 io, 0.0 network, 64.0 memory}, id = 1337
>> 00-14                        StreamAgg(group=[{0, 1, 2, 3}]) : rowType =
>> RecordType(INTEGER b_id, INTEGER c_id, DATE a_date, INTEGER a_value):
>> rowcount = 1.0, cumulative cost = {4.0 rows, 33.0 cpu, 0.0 io, 0.0 network,
>> 32.0 memory}, id = 1336
>> 00-15                          Sort(sort0=[$0], sort1=[$1], sort2=[$2],
>> sort3=[$3], dir0=[ASC], dir1=[ASC], dir2=[ASC], dir3=[ASC]) : rowType =
>> RecordType(INTEGER b_id, INTEGER c_id, DATE a_date, INTEGER a_value):
>> rowcount = 1.0, cumulative cost = {3.0 rows, 17.0 cpu, 0.0 io, 0.0 network,
>> 32.0 memory}, id = 1335
>> 00-16                            Project(b_id=[CAST(ITEM($0, 0)):INTEGER],
>> c_id=[CAST(ITEM($0, 1)):INTEGER], a_date=[CAST(ITEM($0, 2)):DATE],
>> a_value=[CAST(ITEM($0, 3)):INTEGER]) : rowType = RecordType(INTEGER b_id,
>> INTEGER c_id, DATE a_date, INTEGER a_value): rowcount = 1.0, cumulative
>> cost = {2.0 rows, 17.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1334
>> 00-17                              Scan(groupscan=[EasyGroupScan
>> [selectionRoot=/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv,
>> numFiles=1, columns=[`columns`[0], `columns`[1], `columns`[2],
>> `columns`[3]],
>> files=[file:/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv]]])
>> : rowType = RecordType(ANY columns): rowcount = 1.0, cumulative cost = {1.0
>> rows, 1.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1333
>> 00-06              SelectionVectorRemover : rowType = RecordType(INTEGER
>> c_id, VARCHAR(1) c_desc, INTEGER c4_id, VARCHAR(1) c4_desc, INTEGER c3_id,
>> INTEGER c2_id, INTEGER c1_id): rowcount = 1.0, cumulative cost = {4.0 rows,
>> 30.0 cpu, 0.0 io, 0.0 network, 56.0 memory}, id = 1332
>> 00-08                Sort(sort0=[$0], sort1=[$2], sort2=[$4], sort3=[$5],
>> sort4=[$6], dir0=[ASC], dir1=[ASC], dir2=[ASC], dir3=[ASC], dir4=[ASC]) :
>> rowType = RecordType(INTEGER c_id, VARCHAR(1) c_desc, INTEGER c4_id,
>> VARCHAR(1) c4_desc, INTEGER c3_id, INTEGER c2_id, INTEGER c1_id): rowcount
>> = 1.0, cumulative cost = {3.0 rows, 29.0 cpu, 0.0 io, 0.0 network, 56.0
>> memory}, id = 1331
>> 00-10                  Project(c_id=[CAST(ITEM($0, 1)):INTEGER],
>> c_desc=[CAST(ITEM($0, 12)):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE
>> "ISO-8859-1$en_US$primary"], c4_id=[CAST(ITEM($0, 11)):INTEGER],
>> c4_desc=[CAST(ITEM($0, 10)):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE
>> "ISO-8859-1$en_US$primary"], c3_id=[CAST(ITEM($0, 9)):INTEGER],
>> c2_id=[CAST(ITEM($0, 7)):INTEGER], c1_id=[CAST(ITEM($0, 5)):INTEGER]) :
>> rowType = RecordType(INTEGER c_id, VARCHAR(1) c_desc, INTEGER c4_id,
>> VARCHAR(1) c4_desc, INTEGER c3_id, INTEGER c2_id, INTEGER c1_id): rowcount
>> = 1.0, cumulative cost = {2.0 rows, 29.0 cpu, 0.0 io, 0.0 network, 0.0
>> memory}, id = 1330
>> 00-12                    Scan(groupscan=[EasyGroupScan
>> [selectionRoot=/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv,
>> numFiles=1, columns=[`columns`[1], `columns`[12], `columns`[11],
>> `columns`[10], `columns`[9], `columns`[7], `columns`[5]],
>> files=[file:/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv]]])
>> : rowType = RecordType(ANY columns): rowcount = 1.0, cumulative cost = {1.0
>> rows, 1.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1329
>> | {
>>  "head" : {
>>    "version" : 1,
>>    "generator" : {
>>      "type" : "ExplainHandler",
>>      "info" : ""
>>    },
>>    "type" : "APACHE_DRILL_PHYSICAL",
>>    "options" : [ ],
>>    "queue" : 0,
>>    "resultMode" : "EXEC"
>>  },
>>  "graph" : [ {
>>    "pop" : "fs-scan",
>>    "@id" : 12,
>>    "userName" : "rsingh",
>>    "files" : [
>> "file:/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv"
>> ],
>>    "storage" : {
>>      "type" : "file",
>>      "enabled" : true,
>>      "connection" : "file:///",
>>      "workspaces" : {
>>        "root" : {
>>          "location" : "/",
>>          "writable" : false,
>>          "defaultInputFormat" : null
>>        },
>>        "tmp" : {
>>          "location" : "/tmp",
>>          "writable" : true,
>>          "defaultInputFormat" : null
>>        }
>>      },
>>      "formats" : {
>>        "psv" : {
>>          "type" : "text",
>>          "extensions" : [ "tbl" ],
>>          "delimiter" : "|"
>>        },
>>        "csv" : {
>>          "type" : "text",
>>          "extensions" : [ "csv" ],
>>          "delimiter" : ","
>>        },
>>        "tsv" : {
>>          "type" : "text",
>>          "extensions" : [ "tsv" ],
>>          "delimiter" : "\t"
>>        },
>>        "parquet" : {
>>          "type" : "parquet"
>>        },
>>        "json" : {
>>          "type" : "json"
>>        },
>>        "avro" : {
>>          "type" : "avro"
>>        }
>>      }
>>    },
>>    "format" : {
>>      "type" : "text",
>>      "extensions" : [ "csv" ],
>>      "delimiter" : ","
>>    },
>>    "columns" : [ "`columns`[1]", "`columns`[12]", "`columns`[11]",
>> "`columns`[10]", "`columns`[9]", "`columns`[7]", "`columns`[5]" ],
>>    "selectionRoot" :
>> "/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv",
>>    "cost" : 1.0
>>  }, {
>>    "pop" : "project",
>>    "@id" : 10,
>>    "exprs" : [ {
>>      "ref" : "`c_id`",
>>      "expr" : "cast( (`columns`[1] ) as INT )"
>>    }, {
>>      "ref" : "`c_desc`",
>>      "expr" : "cast( (`columns`[12] ) as VARCHAR(1) )"
>>    }, {
>>      "ref" : "`c4_id`",
>>      "expr" : "cast( (`columns`[11] ) as INT )"
>>    }, {
>>      "ref" : "`c4_desc`",
>>      "expr" : "cast( (`columns`[10] ) as VARCHAR(1) )"
>>    }, {
>>      "ref" : "`c3_id`",
>>      "expr" : "cast( (`columns`[9] ) as INT )"
>>    }, {
>>      "ref" : "`c2_id`",
>>      "expr" : "cast( (`columns`[7] ) as INT )"
>>    }, {
>>      "ref" : "`c1_id`",
>>      "expr" : "cast( (`columns`[5] ) as INT )"
>>    } ],
>>    "child" : 12,
>>    "initialAllocation" : 1000000,
>>    "maxAllocation" : 10000000000,
>>    "cost" : 1.0
>>  }, {
>>    "pop" : "external-sort",
>>    "@id" : 8,
>>    "child" : 10,
>>    "orderings" : [ {
>>      "expr" : "`c_id`",
>>      "order" : "ASC",
>>      "nullDirection" : "UNSPECIFIED"
>>    }, {
>>      "expr" : "`c4_id`",
>>      "order" : "ASC",
>>      "nullDirection" : "UNSPECIFIED"
>>    }, {
>>      "expr" : "`c3_id`",
>>      "order" : "ASC",
>>      "nullDirection" : "UNSPECIFIED"
>>    }, {
>>      "expr" : "`c2_id`",
>>      "order" : "ASC",
>>      "nullDirection" : "UNSPECIFIED"
>>    }, {
>>      "expr" : "`c1_id`",
>>      "order" : "ASC",
>>      "nullDirection" : "UNSPECIFIED"
>>    } ],
>>    "reverse" : false,
>>    "initialAllocation" : 20000000,
>>    "maxAllocation" : 10000000000,
>>    "cost" : 1.0
>>  }, {
>>    "pop" : "selection-vector-remover",
>>    "@id" : 6,
>>    "child" : 8,
>>    "initialAllocation" : 1000000,
>>    "maxAllocation" : 10000000000,
>>    "cost" : 1.0
>>  }, {
>>    "pop" : "fs-scan",
>>    "@id" : 17,
>>    "userName" : "rsingh",
>>    "files" : [
>> "file:/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv"
>> ],
>>    "storage" : {
>>      "type" : "file",
>>      "enabled" : true,
>>      "connection" : "file:///",
>>      "workspaces" : {
>>        "root" : {
>>          "location" : "/",
>>          "writable" : false,
>>          "defaultInputFormat" : null
>>        },
>>        "tmp" : {
>>          "location" : "/tmp",
>>          "writable" : true,
>>          "defaultInputFormat" : null
>>        }
>>      },
>>      "formats" : {
>>        "psv" : {
>>          "type" : "text",
>>          "extensions" : [ "tbl" ],
>>          "delimiter" : "|"
>>        },
>>        "csv" : {
>>          "type" : "text",
>>          "extensions" : [ "csv" ],
>>          "delimiter" : ","
>>        },
>>        "tsv" : {
>> |
>> +-------------
>> 
>> 
>> Rajkumar Singh
>> MapR Technologies
>> 
>> 
>>> On Jun 20, 2015, at 9:07 PM, Hao Zhu <[email protected]> wrote:
>>> 
>>> Hello,
>>> 
>>> I tried to create the same data in my lab with Drill 1.0 on MapR 4.1,
>>> however both SQL works fine in my end:
>>> select a11.costcenter_id as costcenter_id, max(a12.costcenter_desc) as
>>> costcenter_desc, max(a12.costcenter_name_desc) as costcenter_name_desc,
>>> sum(a11.account_value) as sss from view_fact_account a11
>>> join view_dim_costcenter a12 on (a11.costcenter_id =
>>> a12.costcenter_id) group by a11.costcenter_id;
>>> +----------------+------------------+-----------------------+------+
>>> | costcenter_id  | costcenter_desc  | costcenter_name_desc  | sss  |
>>> +----------------+------------------+-----------------------+------+
>>> | 2              | a                | a                     | 3    |
>>> +----------------+------------------+-----------------------+------+
>>> 1 row selected (0.302 seconds)
>>> 
>>> select a11.costcenter_id as costcenter_id, max(a12.costcenter_desc) as
>>> costcenter_desc, max(a12.costcenter_name_desc) as costcenter_name_desc,
>>> sum(a11.account_value) as sss from view_dim_costcenter a12
>>> join view_fact_account a11 on (a11.costcenter_id =
>>> a12.costcenter_id) group by a11.costcenter_id;
>>> 
>>> +----------------+------------------+-----------------------+------+
>>> | costcenter_id  | costcenter_desc  | costcenter_name_desc  | sss  |
>>> +----------------+------------------+-----------------------+------+
>>> | 2              | a                | a                     | 3    |
>>> +----------------+------------------+-----------------------+------+
>>> 1 row selected (0.209 seconds)
>>> 
>>> To narrow down the issue, could you test something:
>>> 1. Is this issue only happening with user2? Do you have the same issue
>>> using user1 also?
>>> Just want to confirm if this issue is related to impersonation or
>>> permission.
>>> 
>>> 2. Is this issue only happening with the 582 rows table?
>>> I mean, if the 2 tables have fewer rows, can this issue reproduce?
>>> In my test, I only created 1 row.
>>> I just want to know if this issue is data driver or not.
>>> 
>>> 3. Could you attach the good SQL and bad SQL profiles, so that the SQL
>> plan
>>> is more readable?
>>> 
>>> Thanks,
>>> Hao
>>> 
>>> On Thu, Jun 18, 2015 at 6:46 AM, Mustafa Engin Sözer <
>>> [email protected]> wrote:
>>> 
>>>> Hi everyone,
>>>> 
>>>> I've had an earlier topic regarding this issue but no resolution came
>> out
>>>> of this and it couldn't be reproduced. Let me re-describe the issue and
>> my
>>>> cluster:
>>>> 
>>>> Currently I have a 5-node Mapr cluster on AWS, including Drill. On both
>>>> sides, the security is enabled and on drill, impersonation is also
>> enabled.
>>>> The only other configuration I changed in drill was the new views
>>>> permissions which I set to 750. I'm using maprfs and our MapR version is
>>>> 4.1.0 and Drill version is 1.0.0.
>>>> 
>>>> So the process goes like this:
>>>> 
>>>> I have two users involved in this process, called usr1 and usr2. usr1 is
>>>> kind of an admin for the raw data whereas usr2 is not allowed to access
>> to
>>>> raw data.
>>>> 
>>>> usr1 writes 3 csv files to /raw/costcenter volume and creates a
>> relational
>>>> model using drill views. These views are written to /views/costcenter
>> where
>>>> usr2 has access to. So usr2 can query these views without any issues.
>>>> 
>>>> So there comes the problem. Along with several other tables, I have 2
>>>> views, namely fact_account and dim_costcenter (created out of the same
>> csv)
>>>> Here are the table definitions:
>>>> 
>>>> describe dfs.views_costcenter.fact_account;
>>>> +----------------+------------+--------------+
>>>> |  COLUMN_NAME   | DATA_TYPE  | IS_NULLABLE  |
>>>> +----------------+------------+--------------+
>>>> | account_id     | INTEGER    | YES          |
>>>> | costcenter_id  | INTEGER    | YES          |
>>>> | account_date   | DATE       | YES          |
>>>> | account_value  | INTEGER    | YES          |
>>>> +----------------+------------+--------------+
>>>> 
>>>> describe dfs.views_costcenter.dim_costcenter;
>>>> +-----------------------+------------+--------------+
>>>> |      COLUMN_NAME      | DATA_TYPE  | IS_NULLABLE  |
>>>> +-----------------------+------------+--------------+
>>>> | costcenter_id         | INTEGER    | YES          |
>>>> | costcenter_desc       | VARCHAR    | YES          |
>>>> | costcenter_name_id    | INTEGER    | YES          |
>>>> | costcenter_name_desc  | VARCHAR    | YES          |
>>>> | department_id         | INTEGER    | YES          |
>>>> | division_id           | INTEGER    | YES          |
>>>> | area_id               | INTEGER    | YES          |
>>>> +-----------------------+------------+--------------+
>>>> 
>>>> Both tables have 582 rows.
>>>> 
>>>> So I need to join these two tables and run some aggregations on them in
>>>> order to create a report. I have the following query:
>>>> 
>>>> select a11.costcenter_id as costcenter_id, max(a12.costcenter_desc) as
>>>> costcenter_desc, max(a12.costcenter_name_desc) as costcenter_name_desc,
>>>> sum(a11.account_value) as sss from dfs.views_costcenter.fact_account a11
>>>> join dfs.views_costcenter.dim_costcenter a12 on (a11.costcenter_id =
>>>> a12.costcenter_id) group by a11.costcenter_id;
>>>> 
>>>> When I run this query, the execution planner throws a huge exception as
>> you
>>>> can see below. However, I've found a strange solution to that. If I
>>>> exchange the order of the tables within the join, ie.
>>>> 
>>>> select a11.costcenter_id as costcenter_id, max(a12.costcenter_desc) as
>>>> costcenter_desc, max(a12.costcenter_name_desc) as costcenter_name_desc,
>>>> sum(a11.account_value) as sss from dfs.views_costcenter.dim_costcenter
>> a12
>>>> join dfs.views_costcenter.fact_account a11 on (a11.costcenter_id =
>>>> a12.costcenter_id) group by a11.costcenter_id;
>>>> 
>>>> It works perfectly. So in summary, if I write t2 join t1 instead of t1
>> join
>>>> t2 and change nothing else, it works like a charm. As inner join is
>>>> commutative and associative, this was completely unexpected for me. Can
>>>> someone confirm if this is a bug? I didn't want to file a bug to JIRA
>>>> before asking you guys here first.
>>>> 
>>>> Thanks in advance for your help.
>>>> 
>>>> Below you can find the exception: (as the exception is huge, I've only
>>>> posted part of it here. Please let me know if you need the complete
>>>> exception)
>>>> 
>>>> 
>>>> Error: SYSTEM ERROR:
>>>> org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node
>>>> [rel#15146:Subset#27.PHYSICAL.SINGLETON([]).[]] could not be
>> implemented;
>>>> planner state:
>>>> 
>>>> Root: rel#15146:Subset#27.PHYSICAL.SINGLETON([]).[]
>>>> Original rel:
>>>> 
>> AbstractConverter(subset=[rel#15146:Subset#27.PHYSICAL.SINGLETON([]).[]],
>>>> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])],
>>>> sort=[[]]): rowcount = 101.8, cumulative cost = {inf}, id = 15148
>>>> DrillScreenRel(subset=[rel#15145:Subset#27.LOGICAL.ANY([]).[]]):
>> rowcount
>>>> = 101.8, cumulative cost = {10.18 rows, 10.18 cpu, 0.0 io, 0.0 network,
>> 0.0
>>>> memory}, id = 15144
>>>>   DrillAggregateRel(subset=[rel#15143:Subset#26.LOGICAL.ANY([]).[]],
>>>> group=[{0}], costcenter_desc=[MAX($1)], costcenter_name_desc=[MAX($2)],
>>>> sss=[SUM($3)]): rowcount = 101.8, cumulative cost = {1.0 rows, 1.0 cpu,
>> 0.0
>>>> io, 0.0 network, 0.0 memory}, id = 15142
>>>>     DrillProjectRel(subset=[rel#15141:Subset#25.LOGICAL.ANY([]).[]],
>>>> costcenter_id=[$1], costcenter_desc=[$5], costcenter_name_desc=[$7],
>>>> account_value=[$3]): rowcount = 1018.0, cumulative cost = {0.0 rows, 0.0
>>>> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15140
>>>>       DrillProjectRel(subset=[rel#15139:Subset#24.LOGICAL.ANY([]).[0,
>> 2,
>>>> 4, 5, 6]], account_id=[$7], costcenter_id=[$8], account_date=[$9],
>>>> account_value=[$10], costcenter_id0=[$0], costcenter_desc=[$1],
>>>> costcenter_name_id=[$2], costcenter_name_desc=[$3], department_id=[$4],
>>>> division_id=[$5], area_id=[$6]): rowcount = 1018.0, cumulative cost =
>> {0.0
>>>> rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15138
>>>>         DrillJoinRel(subset=[rel#15137:Subset#23.LOGICAL.ANY([]).[0, 2,
>>>> 4, 5, 6]], condition=[=($8, $0)], joinType=[inner]): rowcount = 1018.0,
>>>> cumulative cost = {1119.8 rows, 13030.4 cpu, 0.0 io, 0.0 network,
>> 1791.68
>>>> memory}, id = 15136
>>>>           DrillSortRel(subset=[rel#15128:Subset#18.LOGICAL.ANY([]).[0,
>> 2,
>>>> 4, 5, 6]], sort0=[$0], sort1=[$2], sort2=[$4], sort3=[$5], sort4=[$6],
>>>> dir0=[ASC], dir1=[ASC], dir2=[ASC], dir3=[ASC], dir4=[ASC]): rowcount =
>>>> 1018.0, cumulative cost = {197407.16549843678 rows, 1018.0 cpu, 0.0 io,
>> 0.0
>>>> network, 0.0 memory}, id = 15127
>>>> 
>>>> DrillProjectRel(subset=[rel#15126:Subset#17.LOGICAL.ANY([]).[]],
>>>> costcenter_id=[CAST(ITEM($0, 1)):INTEGER],
>> costcenter_desc=[CAST(ITEM($0,
>>>> 12)):VARCHAR(45) CHARACTER SET "ISO-8859-1" COLLATE
>>>> "ISO-8859-1$en_US$primary"], costcenter_name_id=[CAST(ITEM($0,
>>>> 11)):INTEGER], costcenter_name_desc=[CAST(ITEM($0, 10)):VARCHAR(45)
>>>> CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"],
>>>> department_id=[CAST(ITEM($0, 9)):INTEGER], division_id=[CAST(ITEM($0,
>>>> 7)):INTEGER], area_id=[CAST(ITEM($0, 5)):INTEGER]): rowcount = 1018.0,
>>>> cumulative cost = {1018.0 rows, 28504.0 cpu, 0.0 io, 0.0 network, 0.0
>>>> memory}, id = 15125
>>>> 
>>>> DrillScanRel(subset=[rel#15124:Subset#16.LOGICAL.ANY([]).[]],
>> table=[[dfs,
>>>> raw_costcenter, master_datev*.csv]], groupscan=[EasyGroupScan
>>>> [selectionRoot=/raw/costcenter/master_datev*.csv, numFiles=1,
>>>> columns=[`columns`[1], `columns`[12], `columns`[11], `columns`[10],
>>>> `columns`[9], `columns`[7], `columns`[5]],
>>>> files=[maprfs:/raw/costcenter/master_datev_20150617.csv]]]): rowcount =
>>>> 1018.0, cumulative cost = {1018.0 rows, 1018.0 cpu, 0.0 io, 0.0 network,
>>>> 0.0 memory}, id = 15064
>>>>           DrillSortRel(subset=[rel#15135:Subset#22.LOGICAL.ANY([]).[0,
>> 1,
>>>> 2]], sort0=[$0], sort1=[$1], sort2=[$2], dir0=[ASC], dir1=[ASC],
>>>> dir2=[ASC]): rowcount = 101.8, cumulative cost = {7529.958857584828
>> rows,
>>>> 101.8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15134
>>>> 
>>>> DrillAggregateRel(subset=[rel#15133:Subset#21.LOGICAL.ANY([]).[]],
>>>> group=[{0, 1, 2, 3}]): rowcount = 101.8, cumulative cost = {1.0 rows,
>> 1.0
>>>> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15132
>>>> 
>>>> DrillProjectRel(subset=[rel#15131:Subset#20.LOGICAL.ANY([]).[]],
>>>> account_id=[CAST(ITEM($0, 0)):INTEGER], costcenter_id=[CAST(ITEM($0,
>>>> 1)):INTEGER], account_date=[CAST(ITEM($0, 2)):DATE],
>>>> account_value=[CAST(ITEM($0, 3)):INTEGER]): rowcount = 1018.0,
>> cumulative
>>>> cost = {1018.0 rows, 16288.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id =
>>>> 15130
>>>> 
>>>> DrillScanRel(subset=[rel#15129:Subset#19.LOGICAL.ANY([]).[]],
>> table=[[dfs,
>>>> raw_costcenter, master_datev*.csv]], groupscan=[EasyGroupScan
>>>> [selectionRoot=/raw/costcenter/master_datev*.csv, numFiles=1,
>>>> columns=[`columns`[0], `columns`[1], `columns`[2], `columns`[3]],
>>>> files=[maprfs:/raw/costcenter/master_datev_20150617.csv]]]): rowcount =
>>>> 1018.0, cumulative cost = {1018.0 rows, 1018.0 cpu, 0.0 io, 0.0 network,
>>>> 0.0 memory}, id = 15074
>>>> 
>>>> Sets:
>>>> Set#16, type: RecordType(ANY columns)
>>>> rel#15124:Subset#16.LOGICAL.ANY([]).[], best=rel#15064,
>>>> importance=0.4304672100000001
>>>> 
>>>> 
>>>> --
>>>> 
>>>> *M. Engin Sözer*
>>>> Junior Datawarehouse Manager
>>>> [email protected]
>>>> 
>>>> Goodgame Studios
>>>> Theodorstr. 42-90, House 9
>>>> 22761 Hamburg, Germany
>>>> Phone: +49 (0)40 219 880 -0
>>>> *www.goodgamestudios.com <http://www.goodgamestudios.com>*
>>>> 
>>>> Goodgame Studios is a branch of Altigi GmbH
>>>> Altigi GmbH, District court Hamburg, HRB 99869
>>>> Board of directors: Dr. Kai Wawrzinek, Dr. Christian Wawrzinek, Fabian
>>>> Ritter
>>>> 
>> 
>>

Re: Exception during Execution Planning for joins

Reply via email to