Re: Exception during Execution Planning for joins

Mustafa Engin Sözer Mon, 22 Jun 2015 01:35:20 -0700

Hi everyone,

Thanks for the help. I was unable to read through the mails over the
weekend, so I just saw that the problem is confirmed here. Do you need
anything from me in order to pinpoint the problem?


Is a JIRA ticket already created or should I create one myself?

Thanks again.

On 22 June 2015 at 07:06, Rajkumar Singh <[email protected]> wrote:

> Hi Hao
>
> I have only one row in my sample data and find along the queries which I
> used to reproduce it.
>
> rsingh@Administrators-MacBook-Pro-4 ~/Downloads/apache-drill-1.0.0$ cat
> sample-data/master-data.csv
> 1,1,20.4.09,201,Orci Donec Nibh 
> PC,1,[email protected],200,Dai
> Woodward,300,Hannah Sims,400,Abdul
>
>
>         CREATE VIEW dfs.tmp.`fact_a` AS select distinct CAST(columns[0]
>        as INTEGER) as b_id, CAST(columns[1] as INTEGER) as c_id,
>        CAST(columns[2] as DATE) as a_date, CAST(columns[3] AS INTEGER)
>        as a_value from
> dfs.root.`/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv`
> order by
>        b_id, c_id, a_date;
>
>
>         CREATE VIEW dfs.tmp.`dim_c` AS select CAST(columns[1] as
>        INTEGER) as c_id, CAST(columns[12] as VARCHAR) as c_desc,
>        CAST(columns[11] as INTEGER) as c4_id, CAST(columns[10] as
>        VARCHAR) as c4_desc, CAST(columns[9] as INTEGER) as c3_id,
>        CAST(columns[7] as INTEGER) as c2_id, CAST(columns[5] as
>        INTEGER) as c1_id from
> dfs.root.`/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv`
> order
>        by c_id, c4_id, c3_id, c2_id, c1_id;
>
>
>         select a11.c_id  c_id,
>         max(a12.c_desc)  c_desc,
>         max(a12.c4_desc)  c4_desc,
>         sum(a11.a_value)  a_value
>         from dfs.tmp.`fact_a` a11
>         join dfs.tmp.`dim_c` a12
>           on (a11.c_id =a12.c_id)
>         group by a11.c_id;
>
>
>
> Rajkumar Singh
> MapR Technologies
>
>
> > On Jun 21, 2015, at 1:34 AM, Hao Zhu <[email protected]> wrote:
> >
> > Nice. How many rows do you create?
> > Just curious what is the difference with my reproduce.
> >
> > Here are mine:
> > CREATE VIEW view_fact_account
> > AS select cast(columns[0] as int) account_id,cast(columns[1] as int)
> > costcenter_id, cast(columns[2] as date) account_date,cast(columns[3] as
> > int) account_value from dfs.drill.fact_account;
> >
> >
> > CREATE VIEW view_dim_costcenter AS
> > select cast(columns[0] as int) costcenter_id,cast(columns[1] as varchar)
> > costcenter_desc, cast(columns[2] as int) costcenter_name_id,
> > cast(columns[3] as varchar) costcenter_name_desc,cast(columns[4] as int)
> > department_id, cast(columns[5] as int) division_id, cast(columns[6] as
> int)
> > area_id from dfs.drill.dim_costcenter;
> >
> > Both tables only have 1 rows.
> >
> > 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> select * from
> > view_dim_costcenter ;
> >
> +----------------+------------------+---------------------+-----------------------+----------------+--------------+----------+
> > | costcenter_id  | costcenter_desc  | costcenter_name_id  |
> > costcenter_name_desc  | department_id  | division_id  | area_id  |
> >
> +----------------+------------------+---------------------+-----------------------+----------------+--------------+----------+
> > | 2              | a                | 2                   | a
> >      | 3              | 4            | 5        |
> >
> +----------------+------------------+---------------------+-----------------------+----------------+--------------+----------+
> > 1 row selected (0.255 seconds)
> > 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> select * from
> > view_fact_account ;
> > +-------------+----------------+---------------+----------------+
> > | account_id  | costcenter_id  | account_date  | account_value  |
> > +-------------+----------------+---------------+----------------+
> > | 1           | 2              | 2015-01-01    | 3              |
> > +-------------+----------------+---------------+----------------+
> > 1 row selected (0.141 seconds)
> >
> > Thanks,
> > Hao
> >
> > On Sat, Jun 20, 2015 at 9:31 AM, Rajkumar Singh <[email protected]>
> wrote:
> >
> >> Hi Hao
> >>
> >> I tried to reproduce the issue and able to repro it, I am running
> >> drill-1.0.0 in embedded mode with a small data set on my mac machine.
> >>
> >> for the bad sql I am getting calcite cannotplanexception (same as the
> >> error stack), for a good sql find below the explain plan.
> >>
> >>
> >>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
> >> | 00-00    Screen : rowType = RecordType(INTEGER c_id, VARCHAR(1)
> c_desc,
> >> VARCHAR(1) c4_desc, INTEGER a_value): rowcount = 1.0, cumulative cost =
> >> {15.1 rows, 129.1 cpu, 0.0 io, 0.0 network, 187.2 memory}, id = 1346
> >> 00-01      Project(c_id=[$0], c_desc=[$1], c4_desc=[$2], a_value=[$3]) :
> >> rowType = RecordType(INTEGER c_id, VARCHAR(1) c_desc, VARCHAR(1)
> c4_desc,
> >> INTEGER a_value): rowcount = 1.0, cumulative cost = {15.0 rows, 129.0
> cpu,
> >> 0.0 io, 0.0 network, 187.2 memory}, id = 1345
> >> 00-02        HashAgg(group=[{0}], c_desc=[MAX($1)], c4_desc=[MAX($2)],
> >> a_value=[SUM($3)]) : rowType = RecordType(INTEGER c_id, VARCHAR(1)
> c_desc,
> >> VARCHAR(1) c4_desc, INTEGER a_value): rowcount = 1.0, cumulative cost =
> >> {15.0 rows, 129.0 cpu, 0.0 io, 0.0 network, 187.2 memory}, id = 1344
> >> 00-03          Project(c_id=[$8], c_desc=[$1], c4_desc=[$3],
> >> a_value=[$10]) : rowType = RecordType(INTEGER c_id, VARCHAR(1) c_desc,
> >> VARCHAR(1) c4_desc, INTEGER a_value): rowcount = 1.0, cumulative cost =
> >> {14.0 rows, 85.0 cpu, 0.0 io, 0.0 network, 169.6 memory}, id = 1343
> >> 00-04            HashJoin(condition=[=($8, $0)], joinType=[inner]) :
> >> rowType = RecordType(INTEGER c_id, VARCHAR(1) c_desc, INTEGER c4_id,
> >> VARCHAR(1) c4_desc, INTEGER c3_id, INTEGER c2_id, INTEGER c1_id, INTEGER
> >> b_id, INTEGER c_id0, DATE a_date, INTEGER a_value): rowcount = 1.0,
> >> cumulative cost = {14.0 rows, 85.0 cpu, 0.0 io, 0.0 network, 169.6
> memory},
> >> id = 1342
> >> 00-05              Project(b_id=[$0], c_id0=[$1], a_date=[$2],
> >> a_value=[$3]) : rowType = RecordType(INTEGER b_id, INTEGER c_id0, DATE
> >> a_date, INTEGER a_value): rowcount = 1.0, cumulative cost = {8.0 rows,
> 35.0
> >> cpu, 0.0 io, 0.0 network, 96.0 memory}, id = 1341
> >> 00-07                SelectionVectorRemover : rowType =
> RecordType(INTEGER
> >> b_id, INTEGER c_id, DATE a_date, INTEGER a_value): rowcount = 1.0,
> >> cumulative cost = {8.0 rows, 35.0 cpu, 0.0 io, 0.0 network, 96.0
> memory},
> >> id = 1340
> >> 00-09                  Sort(sort0=[$0], sort1=[$1], sort2=[$2],
> >> dir0=[ASC], dir1=[ASC], dir2=[ASC]) : rowType = RecordType(INTEGER b_id,
> >> INTEGER c_id, DATE a_date, INTEGER a_value): rowcount = 1.0, cumulative
> >> cost = {7.0 rows, 34.0 cpu, 0.0 io, 0.0 network, 96.0 memory}, id = 1339
> >> 00-11                    SelectionVectorRemover : rowType =
> >> RecordType(INTEGER b_id, INTEGER c_id, DATE a_date, INTEGER a_value):
> >> rowcount = 1.0, cumulative cost = {6.0 rows, 34.0 cpu, 0.0 io, 0.0
> network,
> >> 64.0 memory}, id = 1338
> >> 00-13                      Sort(sort0=[$0], sort1=[$1], sort2=[$2],
> >> dir0=[ASC], dir1=[ASC], dir2=[ASC]) : rowType = RecordType(INTEGER b_id,
> >> INTEGER c_id, DATE a_date, INTEGER a_value): rowcount = 1.0, cumulative
> >> cost = {5.0 rows, 33.0 cpu, 0.0 io, 0.0 network, 64.0 memory}, id = 1337
> >> 00-14                        StreamAgg(group=[{0, 1, 2, 3}]) : rowType =
> >> RecordType(INTEGER b_id, INTEGER c_id, DATE a_date, INTEGER a_value):
> >> rowcount = 1.0, cumulative cost = {4.0 rows, 33.0 cpu, 0.0 io, 0.0
> network,
> >> 32.0 memory}, id = 1336
> >> 00-15                          Sort(sort0=[$0], sort1=[$1], sort2=[$2],
> >> sort3=[$3], dir0=[ASC], dir1=[ASC], dir2=[ASC], dir3=[ASC]) : rowType =
> >> RecordType(INTEGER b_id, INTEGER c_id, DATE a_date, INTEGER a_value):
> >> rowcount = 1.0, cumulative cost = {3.0 rows, 17.0 cpu, 0.0 io, 0.0
> network,
> >> 32.0 memory}, id = 1335
> >> 00-16                            Project(b_id=[CAST(ITEM($0,
> 0)):INTEGER],
> >> c_id=[CAST(ITEM($0, 1)):INTEGER], a_date=[CAST(ITEM($0, 2)):DATE],
> >> a_value=[CAST(ITEM($0, 3)):INTEGER]) : rowType = RecordType(INTEGER
> b_id,
> >> INTEGER c_id, DATE a_date, INTEGER a_value): rowcount = 1.0, cumulative
> >> cost = {2.0 rows, 17.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1334
> >> 00-17                              Scan(groupscan=[EasyGroupScan
> >>
> [selectionRoot=/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv,
> >> numFiles=1, columns=[`columns`[0], `columns`[1], `columns`[2],
> >> `columns`[3]],
> >>
> files=[file:/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv]]])
> >> : rowType = RecordType(ANY columns): rowcount = 1.0, cumulative cost =
> {1.0
> >> rows, 1.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1333
> >> 00-06              SelectionVectorRemover : rowType = RecordType(INTEGER
> >> c_id, VARCHAR(1) c_desc, INTEGER c4_id, VARCHAR(1) c4_desc, INTEGER
> c3_id,
> >> INTEGER c2_id, INTEGER c1_id): rowcount = 1.0, cumulative cost = {4.0
> rows,
> >> 30.0 cpu, 0.0 io, 0.0 network, 56.0 memory}, id = 1332
> >> 00-08                Sort(sort0=[$0], sort1=[$2], sort2=[$4],
> sort3=[$5],
> >> sort4=[$6], dir0=[ASC], dir1=[ASC], dir2=[ASC], dir3=[ASC], dir4=[ASC])
> :
> >> rowType = RecordType(INTEGER c_id, VARCHAR(1) c_desc, INTEGER c4_id,
> >> VARCHAR(1) c4_desc, INTEGER c3_id, INTEGER c2_id, INTEGER c1_id):
> rowcount
> >> = 1.0, cumulative cost = {3.0 rows, 29.0 cpu, 0.0 io, 0.0 network, 56.0
> >> memory}, id = 1331
> >> 00-10                  Project(c_id=[CAST(ITEM($0, 1)):INTEGER],
> >> c_desc=[CAST(ITEM($0, 12)):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE
> >> "ISO-8859-1$en_US$primary"], c4_id=[CAST(ITEM($0, 11)):INTEGER],
> >> c4_desc=[CAST(ITEM($0, 10)):VARCHAR(1) CHARACTER SET "ISO-8859-1"
> COLLATE
> >> "ISO-8859-1$en_US$primary"], c3_id=[CAST(ITEM($0, 9)):INTEGER],
> >> c2_id=[CAST(ITEM($0, 7)):INTEGER], c1_id=[CAST(ITEM($0, 5)):INTEGER]) :
> >> rowType = RecordType(INTEGER c_id, VARCHAR(1) c_desc, INTEGER c4_id,
> >> VARCHAR(1) c4_desc, INTEGER c3_id, INTEGER c2_id, INTEGER c1_id):
> rowcount
> >> = 1.0, cumulative cost = {2.0 rows, 29.0 cpu, 0.0 io, 0.0 network, 0.0
> >> memory}, id = 1330
> >> 00-12                    Scan(groupscan=[EasyGroupScan
> >>
> [selectionRoot=/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv,
> >> numFiles=1, columns=[`columns`[1], `columns`[12], `columns`[11],
> >> `columns`[10], `columns`[9], `columns`[7], `columns`[5]],
> >>
> files=[file:/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv]]])
> >> : rowType = RecordType(ANY columns): rowcount = 1.0, cumulative cost =
> {1.0
> >> rows, 1.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1329
> >> | {
> >>  "head" : {
> >>    "version" : 1,
> >>    "generator" : {
> >>      "type" : "ExplainHandler",
> >>      "info" : ""
> >>    },
> >>    "type" : "APACHE_DRILL_PHYSICAL",
> >>    "options" : [ ],
> >>    "queue" : 0,
> >>    "resultMode" : "EXEC"
> >>  },
> >>  "graph" : [ {
> >>    "pop" : "fs-scan",
> >>    "@id" : 12,
> >>    "userName" : "rsingh",
> >>    "files" : [
> >>
> "file:/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv"
> >> ],
> >>    "storage" : {
> >>      "type" : "file",
> >>      "enabled" : true,
> >>      "connection" : "file:///",
> >>      "workspaces" : {
> >>        "root" : {
> >>          "location" : "/",
> >>          "writable" : false,
> >>          "defaultInputFormat" : null
> >>        },
> >>        "tmp" : {
> >>          "location" : "/tmp",
> >>          "writable" : true,
> >>          "defaultInputFormat" : null
> >>        }
> >>      },
> >>      "formats" : {
> >>        "psv" : {
> >>          "type" : "text",
> >>          "extensions" : [ "tbl" ],
> >>          "delimiter" : "|"
> >>        },
> >>        "csv" : {
> >>          "type" : "text",
> >>          "extensions" : [ "csv" ],
> >>          "delimiter" : ","
> >>        },
> >>        "tsv" : {
> >>          "type" : "text",
> >>          "extensions" : [ "tsv" ],
> >>          "delimiter" : "\t"
> >>        },
> >>        "parquet" : {
> >>          "type" : "parquet"
> >>        },
> >>        "json" : {
> >>          "type" : "json"
> >>        },
> >>        "avro" : {
> >>          "type" : "avro"
> >>        }
> >>      }
> >>    },
> >>    "format" : {
> >>      "type" : "text",
> >>      "extensions" : [ "csv" ],
> >>      "delimiter" : ","
> >>    },
> >>    "columns" : [ "`columns`[1]", "`columns`[12]", "`columns`[11]",
> >> "`columns`[10]", "`columns`[9]", "`columns`[7]", "`columns`[5]" ],
> >>    "selectionRoot" :
> >>
> "/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv",
> >>    "cost" : 1.0
> >>  }, {
> >>    "pop" : "project",
> >>    "@id" : 10,
> >>    "exprs" : [ {
> >>      "ref" : "`c_id`",
> >>      "expr" : "cast( (`columns`[1] ) as INT )"
> >>    }, {
> >>      "ref" : "`c_desc`",
> >>      "expr" : "cast( (`columns`[12] ) as VARCHAR(1) )"
> >>    }, {
> >>      "ref" : "`c4_id`",
> >>      "expr" : "cast( (`columns`[11] ) as INT )"
> >>    }, {
> >>      "ref" : "`c4_desc`",
> >>      "expr" : "cast( (`columns`[10] ) as VARCHAR(1) )"
> >>    }, {
> >>      "ref" : "`c3_id`",
> >>      "expr" : "cast( (`columns`[9] ) as INT )"
> >>    }, {
> >>      "ref" : "`c2_id`",
> >>      "expr" : "cast( (`columns`[7] ) as INT )"
> >>    }, {
> >>      "ref" : "`c1_id`",
> >>      "expr" : "cast( (`columns`[5] ) as INT )"
> >>    } ],
> >>    "child" : 12,
> >>    "initialAllocation" : 1000000,
> >>    "maxAllocation" : 10000000000,
> >>    "cost" : 1.0
> >>  }, {
> >>    "pop" : "external-sort",
> >>    "@id" : 8,
> >>    "child" : 10,
> >>    "orderings" : [ {
> >>      "expr" : "`c_id`",
> >>      "order" : "ASC",
> >>      "nullDirection" : "UNSPECIFIED"
> >>    }, {
> >>      "expr" : "`c4_id`",
> >>      "order" : "ASC",
> >>      "nullDirection" : "UNSPECIFIED"
> >>    }, {
> >>      "expr" : "`c3_id`",
> >>      "order" : "ASC",
> >>      "nullDirection" : "UNSPECIFIED"
> >>    }, {
> >>      "expr" : "`c2_id`",
> >>      "order" : "ASC",
> >>      "nullDirection" : "UNSPECIFIED"
> >>    }, {
> >>      "expr" : "`c1_id`",
> >>      "order" : "ASC",
> >>      "nullDirection" : "UNSPECIFIED"
> >>    } ],
> >>    "reverse" : false,
> >>    "initialAllocation" : 20000000,
> >>    "maxAllocation" : 10000000000,
> >>    "cost" : 1.0
> >>  }, {
> >>    "pop" : "selection-vector-remover",
> >>    "@id" : 6,
> >>    "child" : 8,
> >>    "initialAllocation" : 1000000,
> >>    "maxAllocation" : 10000000000,
> >>    "cost" : 1.0
> >>  }, {
> >>    "pop" : "fs-scan",
> >>    "@id" : 17,
> >>    "userName" : "rsingh",
> >>    "files" : [
> >>
> "file:/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv"
> >> ],
> >>    "storage" : {
> >>      "type" : "file",
> >>      "enabled" : true,
> >>      "connection" : "file:///",
> >>      "workspaces" : {
> >>        "root" : {
> >>          "location" : "/",
> >>          "writable" : false,
> >>          "defaultInputFormat" : null
> >>        },
> >>        "tmp" : {
> >>          "location" : "/tmp",
> >>          "writable" : true,
> >>          "defaultInputFormat" : null
> >>        }
> >>      },
> >>      "formats" : {
> >>        "psv" : {
> >>          "type" : "text",
> >>          "extensions" : [ "tbl" ],
> >>          "delimiter" : "|"
> >>        },
> >>        "csv" : {
> >>          "type" : "text",
> >>          "extensions" : [ "csv" ],
> >>          "delimiter" : ","
> >>        },
> >>        "tsv" : {
> >> |
> >> +-------------
> >>
> >>
> >> Rajkumar Singh
> >> MapR Technologies
> >>
> >>
> >>> On Jun 20, 2015, at 9:07 PM, Hao Zhu <[email protected]> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I tried to create the same data in my lab with Drill 1.0 on MapR 4.1,
> >>> however both SQL works fine in my end:
> >>> select a11.costcenter_id as costcenter_id, max(a12.costcenter_desc) as
> >>> costcenter_desc, max(a12.costcenter_name_desc) as costcenter_name_desc,
> >>> sum(a11.account_value) as sss from view_fact_account a11
> >>> join view_dim_costcenter a12 on (a11.costcenter_id =
> >>> a12.costcenter_id) group by a11.costcenter_id;
> >>> +----------------+------------------+-----------------------+------+
> >>> | costcenter_id  | costcenter_desc  | costcenter_name_desc  | sss  |
> >>> +----------------+------------------+-----------------------+------+
> >>> | 2              | a                | a                     | 3    |
> >>> +----------------+------------------+-----------------------+------+
> >>> 1 row selected (0.302 seconds)
> >>>
> >>> select a11.costcenter_id as costcenter_id, max(a12.costcenter_desc) as
> >>> costcenter_desc, max(a12.costcenter_name_desc) as costcenter_name_desc,
> >>> sum(a11.account_value) as sss from view_dim_costcenter a12
> >>> join view_fact_account a11 on (a11.costcenter_id =
> >>> a12.costcenter_id) group by a11.costcenter_id;
> >>>
> >>> +----------------+------------------+-----------------------+------+
> >>> | costcenter_id  | costcenter_desc  | costcenter_name_desc  | sss  |
> >>> +----------------+------------------+-----------------------+------+
> >>> | 2              | a                | a                     | 3    |
> >>> +----------------+------------------+-----------------------+------+
> >>> 1 row selected (0.209 seconds)
> >>>
> >>> To narrow down the issue, could you test something:
> >>> 1. Is this issue only happening with user2? Do you have the same issue
> >>> using user1 also?
> >>> Just want to confirm if this issue is related to impersonation or
> >>> permission.
> >>>
> >>> 2. Is this issue only happening with the 582 rows table?
> >>> I mean, if the 2 tables have fewer rows, can this issue reproduce?
> >>> In my test, I only created 1 row.
> >>> I just want to know if this issue is data driver or not.
> >>>
> >>> 3. Could you attach the good SQL and bad SQL profiles, so that the SQL
> >> plan
> >>> is more readable?
> >>>
> >>> Thanks,
> >>> Hao
> >>>
> >>> On Thu, Jun 18, 2015 at 6:46 AM, Mustafa Engin Sözer <
> >>> [email protected]> wrote:
> >>>
> >>>> Hi everyone,
> >>>>
> >>>> I've had an earlier topic regarding this issue but no resolution came
> >> out
> >>>> of this and it couldn't be reproduced. Let me re-describe the issue
> and
> >> my
> >>>> cluster:
> >>>>
> >>>> Currently I have a 5-node Mapr cluster on AWS, including Drill. On
> both
> >>>> sides, the security is enabled and on drill, impersonation is also
> >> enabled.
> >>>> The only other configuration I changed in drill was the new views
> >>>> permissions which I set to 750. I'm using maprfs and our MapR version
> is
> >>>> 4.1.0 and Drill version is 1.0.0.
> >>>>
> >>>> So the process goes like this:
> >>>>
> >>>> I have two users involved in this process, called usr1 and usr2. usr1
> is
> >>>> kind of an admin for the raw data whereas usr2 is not allowed to
> access
> >> to
> >>>> raw data.
> >>>>
> >>>> usr1 writes 3 csv files to /raw/costcenter volume and creates a
> >> relational
> >>>> model using drill views. These views are written to /views/costcenter
> >> where
> >>>> usr2 has access to. So usr2 can query these views without any issues.
> >>>>
> >>>> So there comes the problem. Along with several other tables, I have 2
> >>>> views, namely fact_account and dim_costcenter (created out of the same
> >> csv)
> >>>> Here are the table definitions:
> >>>>
> >>>> describe dfs.views_costcenter.fact_account;
> >>>> +----------------+------------+--------------+
> >>>> |  COLUMN_NAME   | DATA_TYPE  | IS_NULLABLE  |
> >>>> +----------------+------------+--------------+
> >>>> | account_id     | INTEGER    | YES          |
> >>>> | costcenter_id  | INTEGER    | YES          |
> >>>> | account_date   | DATE       | YES          |
> >>>> | account_value  | INTEGER    | YES          |
> >>>> +----------------+------------+--------------+
> >>>>
> >>>> describe dfs.views_costcenter.dim_costcenter;
> >>>> +-----------------------+------------+--------------+
> >>>> |      COLUMN_NAME      | DATA_TYPE  | IS_NULLABLE  |
> >>>> +-----------------------+------------+--------------+
> >>>> | costcenter_id         | INTEGER    | YES          |
> >>>> | costcenter_desc       | VARCHAR    | YES          |
> >>>> | costcenter_name_id    | INTEGER    | YES          |
> >>>> | costcenter_name_desc  | VARCHAR    | YES          |
> >>>> | department_id         | INTEGER    | YES          |
> >>>> | division_id           | INTEGER    | YES          |
> >>>> | area_id               | INTEGER    | YES          |
> >>>> +-----------------------+------------+--------------+
> >>>>
> >>>> Both tables have 582 rows.
> >>>>
> >>>> So I need to join these two tables and run some aggregations on them
> in
> >>>> order to create a report. I have the following query:
> >>>>
> >>>> select a11.costcenter_id as costcenter_id, max(a12.costcenter_desc) as
> >>>> costcenter_desc, max(a12.costcenter_name_desc) as
> costcenter_name_desc,
> >>>> sum(a11.account_value) as sss from dfs.views_costcenter.fact_account
> a11
> >>>> join dfs.views_costcenter.dim_costcenter a12 on (a11.costcenter_id =
> >>>> a12.costcenter_id) group by a11.costcenter_id;
> >>>>
> >>>> When I run this query, the execution planner throws a huge exception
> as
> >> you
> >>>> can see below. However, I've found a strange solution to that. If I
> >>>> exchange the order of the tables within the join, ie.
> >>>>
> >>>> select a11.costcenter_id as costcenter_id, max(a12.costcenter_desc) as
> >>>> costcenter_desc, max(a12.costcenter_name_desc) as
> costcenter_name_desc,
> >>>> sum(a11.account_value) as sss from dfs.views_costcenter.dim_costcenter
> >> a12
> >>>> join dfs.views_costcenter.fact_account a11 on (a11.costcenter_id =
> >>>> a12.costcenter_id) group by a11.costcenter_id;
> >>>>
> >>>> It works perfectly. So in summary, if I write t2 join t1 instead of t1
> >> join
> >>>> t2 and change nothing else, it works like a charm. As inner join is
> >>>> commutative and associative, this was completely unexpected for me.
> Can
> >>>> someone confirm if this is a bug? I didn't want to file a bug to JIRA
> >>>> before asking you guys here first.
> >>>>
> >>>> Thanks in advance for your help.
> >>>>
> >>>> Below you can find the exception: (as the exception is huge, I've only
> >>>> posted part of it here. Please let me know if you need the complete
> >>>> exception)
> >>>>
> >>>>
> >>>> Error: SYSTEM ERROR:
> >>>> org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node
> >>>> [rel#15146:Subset#27.PHYSICAL.SINGLETON([]).[]] could not be
> >> implemented;
> >>>> planner state:
> >>>>
> >>>> Root: rel#15146:Subset#27.PHYSICAL.SINGLETON([]).[]
> >>>> Original rel:
> >>>>
> >>
> AbstractConverter(subset=[rel#15146:Subset#27.PHYSICAL.SINGLETON([]).[]],
> >>>> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])],
> >>>> sort=[[]]): rowcount = 101.8, cumulative cost = {inf}, id = 15148
> >>>> DrillScreenRel(subset=[rel#15145:Subset#27.LOGICAL.ANY([]).[]]):
> >> rowcount
> >>>> = 101.8, cumulative cost = {10.18 rows, 10.18 cpu, 0.0 io, 0.0
> network,
> >> 0.0
> >>>> memory}, id = 15144
> >>>>   DrillAggregateRel(subset=[rel#15143:Subset#26.LOGICAL.ANY([]).[]],
> >>>> group=[{0}], costcenter_desc=[MAX($1)],
> costcenter_name_desc=[MAX($2)],
> >>>> sss=[SUM($3)]): rowcount = 101.8, cumulative cost = {1.0 rows, 1.0
> cpu,
> >> 0.0
> >>>> io, 0.0 network, 0.0 memory}, id = 15142
> >>>>     DrillProjectRel(subset=[rel#15141:Subset#25.LOGICAL.ANY([]).[]],
> >>>> costcenter_id=[$1], costcenter_desc=[$5], costcenter_name_desc=[$7],
> >>>> account_value=[$3]): rowcount = 1018.0, cumulative cost = {0.0 rows,
> 0.0
> >>>> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15140
> >>>>       DrillProjectRel(subset=[rel#15139:Subset#24.LOGICAL.ANY([]).[0,
> >> 2,
> >>>> 4, 5, 6]], account_id=[$7], costcenter_id=[$8], account_date=[$9],
> >>>> account_value=[$10], costcenter_id0=[$0], costcenter_desc=[$1],
> >>>> costcenter_name_id=[$2], costcenter_name_desc=[$3],
> department_id=[$4],
> >>>> division_id=[$5], area_id=[$6]): rowcount = 1018.0, cumulative cost =
> >> {0.0
> >>>> rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15138
> >>>>         DrillJoinRel(subset=[rel#15137:Subset#23.LOGICAL.ANY([]).[0,
> 2,
> >>>> 4, 5, 6]], condition=[=($8, $0)], joinType=[inner]): rowcount =
> 1018.0,
> >>>> cumulative cost = {1119.8 rows, 13030.4 cpu, 0.0 io, 0.0 network,
> >> 1791.68
> >>>> memory}, id = 15136
> >>>>           DrillSortRel(subset=[rel#15128:Subset#18.LOGICAL.ANY([]).[0,
> >> 2,
> >>>> 4, 5, 6]], sort0=[$0], sort1=[$2], sort2=[$4], sort3=[$5], sort4=[$6],
> >>>> dir0=[ASC], dir1=[ASC], dir2=[ASC], dir3=[ASC], dir4=[ASC]): rowcount
> =
> >>>> 1018.0, cumulative cost = {197407.16549843678 rows, 1018.0 cpu, 0.0
> io,
> >> 0.0
> >>>> network, 0.0 memory}, id = 15127
> >>>>
> >>>> DrillProjectRel(subset=[rel#15126:Subset#17.LOGICAL.ANY([]).[]],
> >>>> costcenter_id=[CAST(ITEM($0, 1)):INTEGER],
> >> costcenter_desc=[CAST(ITEM($0,
> >>>> 12)):VARCHAR(45) CHARACTER SET "ISO-8859-1" COLLATE
> >>>> "ISO-8859-1$en_US$primary"], costcenter_name_id=[CAST(ITEM($0,
> >>>> 11)):INTEGER], costcenter_name_desc=[CAST(ITEM($0, 10)):VARCHAR(45)
> >>>> CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"],
> >>>> department_id=[CAST(ITEM($0, 9)):INTEGER], division_id=[CAST(ITEM($0,
> >>>> 7)):INTEGER], area_id=[CAST(ITEM($0, 5)):INTEGER]): rowcount = 1018.0,
> >>>> cumulative cost = {1018.0 rows, 28504.0 cpu, 0.0 io, 0.0 network, 0.0
> >>>> memory}, id = 15125
> >>>>
> >>>> DrillScanRel(subset=[rel#15124:Subset#16.LOGICAL.ANY([]).[]],
> >> table=[[dfs,
> >>>> raw_costcenter, master_datev*.csv]], groupscan=[EasyGroupScan
> >>>> [selectionRoot=/raw/costcenter/master_datev*.csv, numFiles=1,
> >>>> columns=[`columns`[1], `columns`[12], `columns`[11], `columns`[10],
> >>>> `columns`[9], `columns`[7], `columns`[5]],
> >>>> files=[maprfs:/raw/costcenter/master_datev_20150617.csv]]]): rowcount
> =
> >>>> 1018.0, cumulative cost = {1018.0 rows, 1018.0 cpu, 0.0 io, 0.0
> network,
> >>>> 0.0 memory}, id = 15064
> >>>>           DrillSortRel(subset=[rel#15135:Subset#22.LOGICAL.ANY([]).[0,
> >> 1,
> >>>> 2]], sort0=[$0], sort1=[$1], sort2=[$2], dir0=[ASC], dir1=[ASC],
> >>>> dir2=[ASC]): rowcount = 101.8, cumulative cost = {7529.958857584828
> >> rows,
> >>>> 101.8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15134
> >>>>
> >>>> DrillAggregateRel(subset=[rel#15133:Subset#21.LOGICAL.ANY([]).[]],
> >>>> group=[{0, 1, 2, 3}]): rowcount = 101.8, cumulative cost = {1.0 rows,
> >> 1.0
> >>>> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15132
> >>>>
> >>>> DrillProjectRel(subset=[rel#15131:Subset#20.LOGICAL.ANY([]).[]],
> >>>> account_id=[CAST(ITEM($0, 0)):INTEGER], costcenter_id=[CAST(ITEM($0,
> >>>> 1)):INTEGER], account_date=[CAST(ITEM($0, 2)):DATE],
> >>>> account_value=[CAST(ITEM($0, 3)):INTEGER]): rowcount = 1018.0,
> >> cumulative
> >>>> cost = {1018.0 rows, 16288.0 cpu, 0.0 io, 0.0 network, 0.0 memory},
> id =
> >>>> 15130
> >>>>
> >>>> DrillScanRel(subset=[rel#15129:Subset#19.LOGICAL.ANY([]).[]],
> >> table=[[dfs,
> >>>> raw_costcenter, master_datev*.csv]], groupscan=[EasyGroupScan
> >>>> [selectionRoot=/raw/costcenter/master_datev*.csv, numFiles=1,
> >>>> columns=[`columns`[0], `columns`[1], `columns`[2], `columns`[3]],
> >>>> files=[maprfs:/raw/costcenter/master_datev_20150617.csv]]]): rowcount
> =
> >>>> 1018.0, cumulative cost = {1018.0 rows, 1018.0 cpu, 0.0 io, 0.0
> network,
> >>>> 0.0 memory}, id = 15074
> >>>>
> >>>> Sets:
> >>>> Set#16, type: RecordType(ANY columns)
> >>>> rel#15124:Subset#16.LOGICAL.ANY([]).[], best=rel#15064,
> >>>> importance=0.4304672100000001
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> *M. Engin Sözer*
> >>>> Junior Datawarehouse Manager
> >>>> [email protected]
> >>>>
> >>>> Goodgame Studios
> >>>> Theodorstr. 42-90, House 9
> >>>> 22761 Hamburg, Germany
> >>>> Phone: +49 (0)40 219 880 -0
> >>>> *www.goodgamestudios.com <http://www.goodgamestudios.com>*
> >>>>
> >>>> Goodgame Studios is a branch of Altigi GmbH
> >>>> Altigi GmbH, District court Hamburg, HRB 99869
> >>>> Board of directors: Dr. Kai Wawrzinek, Dr. Christian Wawrzinek, Fabian
> >>>> Ritter
> >>>>
> >>
> >>
>
>


-- 

*M. Engin Sözer*
Junior Datawarehouse Manager
[email protected]

Goodgame Studios
Theodorstr. 42-90, House 9
22761 Hamburg, Germany
Phone: +49 (0)40 219 880 -0
*www.goodgamestudios.com <http://www.goodgamestudios.com>*

Goodgame Studios is a branch of Altigi GmbH
Altigi GmbH, District court Hamburg, HRB 99869
Board of directors: Dr. Kai Wawrzinek, Dr. Christian Wawrzinek, Fabian
Ritter

Re: Exception during Execution Planning for joins

Reply via email to