Re: Exception during Execution Planning for joins

Rajkumar Singh Mon, 22 Jun 2015 01:49:20 -0700

JIRA is already created for the issue 
https://issues.apache.org/jira/browse/DRILL-3330 
<https://issues.apache.org/jira/browse/DRILL-3330>


Thanks

Rajkumar Singh
MapR Technologies


> On Jun 22, 2015, at 2:04 PM, Mustafa Engin Sözer 
> <[email protected]> wrote:
> 
> Hi everyone,
> 
> Thanks for the help. I was unable to read through the mails over the
> weekend, so I just saw that the problem is confirmed here. Do you need
> anything from me in order to pinpoint the problem?
> 
> Is a JIRA ticket already created or should I create one myself?
> 
> Thanks again.
> 
> On 22 June 2015 at 07:06, Rajkumar Singh <[email protected]> wrote:
> 
>> Hi Hao
>> 
>> I have only one row in my sample data and find along the queries which I
>> used to reproduce it.
>> 
>> rsingh@Administrators-MacBook-Pro-4 ~/Downloads/apache-drill-1.0.0$ cat
>> sample-data/master-data.csv
>> 1,1,20.4.09,201,Orci Donec Nibh 
>> PC,1,[email protected],200,Dai
>> Woodward,300,Hannah Sims,400,Abdul
>> 
>> 
>>        CREATE VIEW dfs.tmp.`fact_a` AS select distinct CAST(columns[0]
>>       as INTEGER) as b_id, CAST(columns[1] as INTEGER) as c_id,
>>       CAST(columns[2] as DATE) as a_date, CAST(columns[3] AS INTEGER)
>>       as a_value from
>> dfs.root.`/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv`
>> order by
>>       b_id, c_id, a_date;
>> 
>> 
>>        CREATE VIEW dfs.tmp.`dim_c` AS select CAST(columns[1] as
>>       INTEGER) as c_id, CAST(columns[12] as VARCHAR) as c_desc,
>>       CAST(columns[11] as INTEGER) as c4_id, CAST(columns[10] as
>>       VARCHAR) as c4_desc, CAST(columns[9] as INTEGER) as c3_id,
>>       CAST(columns[7] as INTEGER) as c2_id, CAST(columns[5] as
>>       INTEGER) as c1_id from
>> dfs.root.`/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv`
>> order
>>       by c_id, c4_id, c3_id, c2_id, c1_id;
>> 
>> 
>>        select a11.c_id  c_id,
>>        max(a12.c_desc)  c_desc,
>>        max(a12.c4_desc)  c4_desc,
>>        sum(a11.a_value)  a_value
>>        from dfs.tmp.`fact_a` a11
>>        join dfs.tmp.`dim_c` a12
>>          on (a11.c_id =a12.c_id)
>>        group by a11.c_id;
>> 
>> 
>> 
>> Rajkumar Singh
>> MapR Technologies
>> 
>> 
>>> On Jun 21, 2015, at 1:34 AM, Hao Zhu <[email protected]> wrote:
>>> 
>>> Nice. How many rows do you create?
>>> Just curious what is the difference with my reproduce.
>>> 
>>> Here are mine:
>>> CREATE VIEW view_fact_account
>>> AS select cast(columns[0] as int) account_id,cast(columns[1] as int)
>>> costcenter_id, cast(columns[2] as date) account_date,cast(columns[3] as
>>> int) account_value from dfs.drill.fact_account;
>>> 
>>> 
>>> CREATE VIEW view_dim_costcenter AS
>>> select cast(columns[0] as int) costcenter_id,cast(columns[1] as varchar)
>>> costcenter_desc, cast(columns[2] as int) costcenter_name_id,
>>> cast(columns[3] as varchar) costcenter_name_desc,cast(columns[4] as int)
>>> department_id, cast(columns[5] as int) division_id, cast(columns[6] as
>> int)
>>> area_id from dfs.drill.dim_costcenter;
>>> 
>>> Both tables only have 1 rows.
>>> 
>>> 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> select * from
>>> view_dim_costcenter ;
>>> 
>> +----------------+------------------+---------------------+-----------------------+----------------+--------------+----------+
>>> | costcenter_id  | costcenter_desc  | costcenter_name_id  |
>>> costcenter_name_desc  | department_id  | division_id  | area_id  |
>>> 
>> +----------------+------------------+---------------------+-----------------------+----------------+--------------+----------+
>>> | 2              | a                | 2                   | a
>>>     | 3              | 4            | 5        |
>>> 
>> +----------------+------------------+---------------------+-----------------------+----------------+--------------+----------+
>>> 1 row selected (0.255 seconds)
>>> 0: jdbc:drill:zk=h2.poc.com:5181,h3.poc.com:5> select * from
>>> view_fact_account ;
>>> +-------------+----------------+---------------+----------------+
>>> | account_id  | costcenter_id  | account_date  | account_value  |
>>> +-------------+----------------+---------------+----------------+
>>> | 1           | 2              | 2015-01-01    | 3              |
>>> +-------------+----------------+---------------+----------------+
>>> 1 row selected (0.141 seconds)
>>> 
>>> Thanks,
>>> Hao
>>> 
>>> On Sat, Jun 20, 2015 at 9:31 AM, Rajkumar Singh <[email protected]>
>> wrote:
>>> 
>>>> Hi Hao
>>>> 
>>>> I tried to reproduce the issue and able to repro it, I am running
>>>> drill-1.0.0 in embedded mode with a small data set on my mac machine.
>>>> 
>>>> for the bad sql I am getting calcite cannotplanexception (same as the
>>>> error stack), for a good sql find below the explain plan.
>>>> 
>>>> 
>>>> 
>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
>>>> | 00-00    Screen : rowType = RecordType(INTEGER c_id, VARCHAR(1)
>> c_desc,
>>>> VARCHAR(1) c4_desc, INTEGER a_value): rowcount = 1.0, cumulative cost =
>>>> {15.1 rows, 129.1 cpu, 0.0 io, 0.0 network, 187.2 memory}, id = 1346
>>>> 00-01      Project(c_id=[$0], c_desc=[$1], c4_desc=[$2], a_value=[$3]) :
>>>> rowType = RecordType(INTEGER c_id, VARCHAR(1) c_desc, VARCHAR(1)
>> c4_desc,
>>>> INTEGER a_value): rowcount = 1.0, cumulative cost = {15.0 rows, 129.0
>> cpu,
>>>> 0.0 io, 0.0 network, 187.2 memory}, id = 1345
>>>> 00-02        HashAgg(group=[{0}], c_desc=[MAX($1)], c4_desc=[MAX($2)],
>>>> a_value=[SUM($3)]) : rowType = RecordType(INTEGER c_id, VARCHAR(1)
>> c_desc,
>>>> VARCHAR(1) c4_desc, INTEGER a_value): rowcount = 1.0, cumulative cost =
>>>> {15.0 rows, 129.0 cpu, 0.0 io, 0.0 network, 187.2 memory}, id = 1344
>>>> 00-03          Project(c_id=[$8], c_desc=[$1], c4_desc=[$3],
>>>> a_value=[$10]) : rowType = RecordType(INTEGER c_id, VARCHAR(1) c_desc,
>>>> VARCHAR(1) c4_desc, INTEGER a_value): rowcount = 1.0, cumulative cost =
>>>> {14.0 rows, 85.0 cpu, 0.0 io, 0.0 network, 169.6 memory}, id = 1343
>>>> 00-04            HashJoin(condition=[=($8, $0)], joinType=[inner]) :
>>>> rowType = RecordType(INTEGER c_id, VARCHAR(1) c_desc, INTEGER c4_id,
>>>> VARCHAR(1) c4_desc, INTEGER c3_id, INTEGER c2_id, INTEGER c1_id, INTEGER
>>>> b_id, INTEGER c_id0, DATE a_date, INTEGER a_value): rowcount = 1.0,
>>>> cumulative cost = {14.0 rows, 85.0 cpu, 0.0 io, 0.0 network, 169.6
>> memory},
>>>> id = 1342
>>>> 00-05              Project(b_id=[$0], c_id0=[$1], a_date=[$2],
>>>> a_value=[$3]) : rowType = RecordType(INTEGER b_id, INTEGER c_id0, DATE
>>>> a_date, INTEGER a_value): rowcount = 1.0, cumulative cost = {8.0 rows,
>> 35.0
>>>> cpu, 0.0 io, 0.0 network, 96.0 memory}, id = 1341
>>>> 00-07                SelectionVectorRemover : rowType =
>> RecordType(INTEGER
>>>> b_id, INTEGER c_id, DATE a_date, INTEGER a_value): rowcount = 1.0,
>>>> cumulative cost = {8.0 rows, 35.0 cpu, 0.0 io, 0.0 network, 96.0
>> memory},
>>>> id = 1340
>>>> 00-09                  Sort(sort0=[$0], sort1=[$1], sort2=[$2],
>>>> dir0=[ASC], dir1=[ASC], dir2=[ASC]) : rowType = RecordType(INTEGER b_id,
>>>> INTEGER c_id, DATE a_date, INTEGER a_value): rowcount = 1.0, cumulative
>>>> cost = {7.0 rows, 34.0 cpu, 0.0 io, 0.0 network, 96.0 memory}, id = 1339
>>>> 00-11                    SelectionVectorRemover : rowType =
>>>> RecordType(INTEGER b_id, INTEGER c_id, DATE a_date, INTEGER a_value):
>>>> rowcount = 1.0, cumulative cost = {6.0 rows, 34.0 cpu, 0.0 io, 0.0
>> network,
>>>> 64.0 memory}, id = 1338
>>>> 00-13                      Sort(sort0=[$0], sort1=[$1], sort2=[$2],
>>>> dir0=[ASC], dir1=[ASC], dir2=[ASC]) : rowType = RecordType(INTEGER b_id,
>>>> INTEGER c_id, DATE a_date, INTEGER a_value): rowcount = 1.0, cumulative
>>>> cost = {5.0 rows, 33.0 cpu, 0.0 io, 0.0 network, 64.0 memory}, id = 1337
>>>> 00-14                        StreamAgg(group=[{0, 1, 2, 3}]) : rowType =
>>>> RecordType(INTEGER b_id, INTEGER c_id, DATE a_date, INTEGER a_value):
>>>> rowcount = 1.0, cumulative cost = {4.0 rows, 33.0 cpu, 0.0 io, 0.0
>> network,
>>>> 32.0 memory}, id = 1336
>>>> 00-15                          Sort(sort0=[$0], sort1=[$1], sort2=[$2],
>>>> sort3=[$3], dir0=[ASC], dir1=[ASC], dir2=[ASC], dir3=[ASC]) : rowType =
>>>> RecordType(INTEGER b_id, INTEGER c_id, DATE a_date, INTEGER a_value):
>>>> rowcount = 1.0, cumulative cost = {3.0 rows, 17.0 cpu, 0.0 io, 0.0
>> network,
>>>> 32.0 memory}, id = 1335
>>>> 00-16                            Project(b_id=[CAST(ITEM($0,
>> 0)):INTEGER],
>>>> c_id=[CAST(ITEM($0, 1)):INTEGER], a_date=[CAST(ITEM($0, 2)):DATE],
>>>> a_value=[CAST(ITEM($0, 3)):INTEGER]) : rowType = RecordType(INTEGER
>> b_id,
>>>> INTEGER c_id, DATE a_date, INTEGER a_value): rowcount = 1.0, cumulative
>>>> cost = {2.0 rows, 17.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1334
>>>> 00-17                              Scan(groupscan=[EasyGroupScan
>>>> 
>> [selectionRoot=/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv,
>>>> numFiles=1, columns=[`columns`[0], `columns`[1], `columns`[2],
>>>> `columns`[3]],
>>>> 
>> files=[file:/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv]]])
>>>> : rowType = RecordType(ANY columns): rowcount = 1.0, cumulative cost =
>> {1.0
>>>> rows, 1.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1333
>>>> 00-06              SelectionVectorRemover : rowType = RecordType(INTEGER
>>>> c_id, VARCHAR(1) c_desc, INTEGER c4_id, VARCHAR(1) c4_desc, INTEGER
>> c3_id,
>>>> INTEGER c2_id, INTEGER c1_id): rowcount = 1.0, cumulative cost = {4.0
>> rows,
>>>> 30.0 cpu, 0.0 io, 0.0 network, 56.0 memory}, id = 1332
>>>> 00-08                Sort(sort0=[$0], sort1=[$2], sort2=[$4],
>> sort3=[$5],
>>>> sort4=[$6], dir0=[ASC], dir1=[ASC], dir2=[ASC], dir3=[ASC], dir4=[ASC])
>> :
>>>> rowType = RecordType(INTEGER c_id, VARCHAR(1) c_desc, INTEGER c4_id,
>>>> VARCHAR(1) c4_desc, INTEGER c3_id, INTEGER c2_id, INTEGER c1_id):
>> rowcount
>>>> = 1.0, cumulative cost = {3.0 rows, 29.0 cpu, 0.0 io, 0.0 network, 56.0
>>>> memory}, id = 1331
>>>> 00-10                  Project(c_id=[CAST(ITEM($0, 1)):INTEGER],
>>>> c_desc=[CAST(ITEM($0, 12)):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE
>>>> "ISO-8859-1$en_US$primary"], c4_id=[CAST(ITEM($0, 11)):INTEGER],
>>>> c4_desc=[CAST(ITEM($0, 10)):VARCHAR(1) CHARACTER SET "ISO-8859-1"
>> COLLATE
>>>> "ISO-8859-1$en_US$primary"], c3_id=[CAST(ITEM($0, 9)):INTEGER],
>>>> c2_id=[CAST(ITEM($0, 7)):INTEGER], c1_id=[CAST(ITEM($0, 5)):INTEGER]) :
>>>> rowType = RecordType(INTEGER c_id, VARCHAR(1) c_desc, INTEGER c4_id,
>>>> VARCHAR(1) c4_desc, INTEGER c3_id, INTEGER c2_id, INTEGER c1_id):
>> rowcount
>>>> = 1.0, cumulative cost = {2.0 rows, 29.0 cpu, 0.0 io, 0.0 network, 0.0
>>>> memory}, id = 1330
>>>> 00-12                    Scan(groupscan=[EasyGroupScan
>>>> 
>> [selectionRoot=/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv,
>>>> numFiles=1, columns=[`columns`[1], `columns`[12], `columns`[11],
>>>> `columns`[10], `columns`[9], `columns`[7], `columns`[5]],
>>>> 
>> files=[file:/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv]]])
>>>> : rowType = RecordType(ANY columns): rowcount = 1.0, cumulative cost =
>> {1.0
>>>> rows, 1.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1329
>>>> | {
>>>> "head" : {
>>>>   "version" : 1,
>>>>   "generator" : {
>>>>     "type" : "ExplainHandler",
>>>>     "info" : ""
>>>>   },
>>>>   "type" : "APACHE_DRILL_PHYSICAL",
>>>>   "options" : [ ],
>>>>   "queue" : 0,
>>>>   "resultMode" : "EXEC"
>>>> },
>>>> "graph" : [ {
>>>>   "pop" : "fs-scan",
>>>>   "@id" : 12,
>>>>   "userName" : "rsingh",
>>>>   "files" : [
>>>> 
>> "file:/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv"
>>>> ],
>>>>   "storage" : {
>>>>     "type" : "file",
>>>>     "enabled" : true,
>>>>     "connection" : "file:///",
>>>>     "workspaces" : {
>>>>       "root" : {
>>>>         "location" : "/",
>>>>         "writable" : false,
>>>>         "defaultInputFormat" : null
>>>>       },
>>>>       "tmp" : {
>>>>         "location" : "/tmp",
>>>>         "writable" : true,
>>>>         "defaultInputFormat" : null
>>>>       }
>>>>     },
>>>>     "formats" : {
>>>>       "psv" : {
>>>>         "type" : "text",
>>>>         "extensions" : [ "tbl" ],
>>>>         "delimiter" : "|"
>>>>       },
>>>>       "csv" : {
>>>>         "type" : "text",
>>>>         "extensions" : [ "csv" ],
>>>>         "delimiter" : ","
>>>>       },
>>>>       "tsv" : {
>>>>         "type" : "text",
>>>>         "extensions" : [ "tsv" ],
>>>>         "delimiter" : "\t"
>>>>       },
>>>>       "parquet" : {
>>>>         "type" : "parquet"
>>>>       },
>>>>       "json" : {
>>>>         "type" : "json"
>>>>       },
>>>>       "avro" : {
>>>>         "type" : "avro"
>>>>       }
>>>>     }
>>>>   },
>>>>   "format" : {
>>>>     "type" : "text",
>>>>     "extensions" : [ "csv" ],
>>>>     "delimiter" : ","
>>>>   },
>>>>   "columns" : [ "`columns`[1]", "`columns`[12]", "`columns`[11]",
>>>> "`columns`[10]", "`columns`[9]", "`columns`[7]", "`columns`[5]" ],
>>>>   "selectionRoot" :
>>>> 
>> "/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv",
>>>>   "cost" : 1.0
>>>> }, {
>>>>   "pop" : "project",
>>>>   "@id" : 10,
>>>>   "exprs" : [ {
>>>>     "ref" : "`c_id`",
>>>>     "expr" : "cast( (`columns`[1] ) as INT )"
>>>>   }, {
>>>>     "ref" : "`c_desc`",
>>>>     "expr" : "cast( (`columns`[12] ) as VARCHAR(1) )"
>>>>   }, {
>>>>     "ref" : "`c4_id`",
>>>>     "expr" : "cast( (`columns`[11] ) as INT )"
>>>>   }, {
>>>>     "ref" : "`c4_desc`",
>>>>     "expr" : "cast( (`columns`[10] ) as VARCHAR(1) )"
>>>>   }, {
>>>>     "ref" : "`c3_id`",
>>>>     "expr" : "cast( (`columns`[9] ) as INT )"
>>>>   }, {
>>>>     "ref" : "`c2_id`",
>>>>     "expr" : "cast( (`columns`[7] ) as INT )"
>>>>   }, {
>>>>     "ref" : "`c1_id`",
>>>>     "expr" : "cast( (`columns`[5] ) as INT )"
>>>>   } ],
>>>>   "child" : 12,
>>>>   "initialAllocation" : 1000000,
>>>>   "maxAllocation" : 10000000000,
>>>>   "cost" : 1.0
>>>> }, {
>>>>   "pop" : "external-sort",
>>>>   "@id" : 8,
>>>>   "child" : 10,
>>>>   "orderings" : [ {
>>>>     "expr" : "`c_id`",
>>>>     "order" : "ASC",
>>>>     "nullDirection" : "UNSPECIFIED"
>>>>   }, {
>>>>     "expr" : "`c4_id`",
>>>>     "order" : "ASC",
>>>>     "nullDirection" : "UNSPECIFIED"
>>>>   }, {
>>>>     "expr" : "`c3_id`",
>>>>     "order" : "ASC",
>>>>     "nullDirection" : "UNSPECIFIED"
>>>>   }, {
>>>>     "expr" : "`c2_id`",
>>>>     "order" : "ASC",
>>>>     "nullDirection" : "UNSPECIFIED"
>>>>   }, {
>>>>     "expr" : "`c1_id`",
>>>>     "order" : "ASC",
>>>>     "nullDirection" : "UNSPECIFIED"
>>>>   } ],
>>>>   "reverse" : false,
>>>>   "initialAllocation" : 20000000,
>>>>   "maxAllocation" : 10000000000,
>>>>   "cost" : 1.0
>>>> }, {
>>>>   "pop" : "selection-vector-remover",
>>>>   "@id" : 6,
>>>>   "child" : 8,
>>>>   "initialAllocation" : 1000000,
>>>>   "maxAllocation" : 10000000000,
>>>>   "cost" : 1.0
>>>> }, {
>>>>   "pop" : "fs-scan",
>>>>   "@id" : 17,
>>>>   "userName" : "rsingh",
>>>>   "files" : [
>>>> 
>> "file:/Users/rsingh/Downloads/apache-drill-1.0.0/sample-data/master-data.csv"
>>>> ],
>>>>   "storage" : {
>>>>     "type" : "file",
>>>>     "enabled" : true,
>>>>     "connection" : "file:///",
>>>>     "workspaces" : {
>>>>       "root" : {
>>>>         "location" : "/",
>>>>         "writable" : false,
>>>>         "defaultInputFormat" : null
>>>>       },
>>>>       "tmp" : {
>>>>         "location" : "/tmp",
>>>>         "writable" : true,
>>>>         "defaultInputFormat" : null
>>>>       }
>>>>     },
>>>>     "formats" : {
>>>>       "psv" : {
>>>>         "type" : "text",
>>>>         "extensions" : [ "tbl" ],
>>>>         "delimiter" : "|"
>>>>       },
>>>>       "csv" : {
>>>>         "type" : "text",
>>>>         "extensions" : [ "csv" ],
>>>>         "delimiter" : ","
>>>>       },
>>>>       "tsv" : {
>>>> |
>>>> +-------------
>>>> 
>>>> 
>>>> Rajkumar Singh
>>>> MapR Technologies
>>>> 
>>>> 
>>>>> On Jun 20, 2015, at 9:07 PM, Hao Zhu <[email protected]> wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I tried to create the same data in my lab with Drill 1.0 on MapR 4.1,
>>>>> however both SQL works fine in my end:
>>>>> select a11.costcenter_id as costcenter_id, max(a12.costcenter_desc) as
>>>>> costcenter_desc, max(a12.costcenter_name_desc) as costcenter_name_desc,
>>>>> sum(a11.account_value) as sss from view_fact_account a11
>>>>> join view_dim_costcenter a12 on (a11.costcenter_id =
>>>>> a12.costcenter_id) group by a11.costcenter_id;
>>>>> +----------------+------------------+-----------------------+------+
>>>>> | costcenter_id  | costcenter_desc  | costcenter_name_desc  | sss  |
>>>>> +----------------+------------------+-----------------------+------+
>>>>> | 2              | a                | a                     | 3    |
>>>>> +----------------+------------------+-----------------------+------+
>>>>> 1 row selected (0.302 seconds)
>>>>> 
>>>>> select a11.costcenter_id as costcenter_id, max(a12.costcenter_desc) as
>>>>> costcenter_desc, max(a12.costcenter_name_desc) as costcenter_name_desc,
>>>>> sum(a11.account_value) as sss from view_dim_costcenter a12
>>>>> join view_fact_account a11 on (a11.costcenter_id =
>>>>> a12.costcenter_id) group by a11.costcenter_id;
>>>>> 
>>>>> +----------------+------------------+-----------------------+------+
>>>>> | costcenter_id  | costcenter_desc  | costcenter_name_desc  | sss  |
>>>>> +----------------+------------------+-----------------------+------+
>>>>> | 2              | a                | a                     | 3    |
>>>>> +----------------+------------------+-----------------------+------+
>>>>> 1 row selected (0.209 seconds)
>>>>> 
>>>>> To narrow down the issue, could you test something:
>>>>> 1. Is this issue only happening with user2? Do you have the same issue
>>>>> using user1 also?
>>>>> Just want to confirm if this issue is related to impersonation or
>>>>> permission.
>>>>> 
>>>>> 2. Is this issue only happening with the 582 rows table?
>>>>> I mean, if the 2 tables have fewer rows, can this issue reproduce?
>>>>> In my test, I only created 1 row.
>>>>> I just want to know if this issue is data driver or not.
>>>>> 
>>>>> 3. Could you attach the good SQL and bad SQL profiles, so that the SQL
>>>> plan
>>>>> is more readable?
>>>>> 
>>>>> Thanks,
>>>>> Hao
>>>>> 
>>>>> On Thu, Jun 18, 2015 at 6:46 AM, Mustafa Engin Sözer <
>>>>> [email protected]> wrote:
>>>>> 
>>>>>> Hi everyone,
>>>>>> 
>>>>>> I've had an earlier topic regarding this issue but no resolution came
>>>> out
>>>>>> of this and it couldn't be reproduced. Let me re-describe the issue
>> and
>>>> my
>>>>>> cluster:
>>>>>> 
>>>>>> Currently I have a 5-node Mapr cluster on AWS, including Drill. On
>> both
>>>>>> sides, the security is enabled and on drill, impersonation is also
>>>> enabled.
>>>>>> The only other configuration I changed in drill was the new views
>>>>>> permissions which I set to 750. I'm using maprfs and our MapR version
>> is
>>>>>> 4.1.0 and Drill version is 1.0.0.
>>>>>> 
>>>>>> So the process goes like this:
>>>>>> 
>>>>>> I have two users involved in this process, called usr1 and usr2. usr1
>> is
>>>>>> kind of an admin for the raw data whereas usr2 is not allowed to
>> access
>>>> to
>>>>>> raw data.
>>>>>> 
>>>>>> usr1 writes 3 csv files to /raw/costcenter volume and creates a
>>>> relational
>>>>>> model using drill views. These views are written to /views/costcenter
>>>> where
>>>>>> usr2 has access to. So usr2 can query these views without any issues.
>>>>>> 
>>>>>> So there comes the problem. Along with several other tables, I have 2
>>>>>> views, namely fact_account and dim_costcenter (created out of the same
>>>> csv)
>>>>>> Here are the table definitions:
>>>>>> 
>>>>>> describe dfs.views_costcenter.fact_account;
>>>>>> +----------------+------------+--------------+
>>>>>> |  COLUMN_NAME   | DATA_TYPE  | IS_NULLABLE  |
>>>>>> +----------------+------------+--------------+
>>>>>> | account_id     | INTEGER    | YES          |
>>>>>> | costcenter_id  | INTEGER    | YES          |
>>>>>> | account_date   | DATE       | YES          |
>>>>>> | account_value  | INTEGER    | YES          |
>>>>>> +----------------+------------+--------------+
>>>>>> 
>>>>>> describe dfs.views_costcenter.dim_costcenter;
>>>>>> +-----------------------+------------+--------------+
>>>>>> |      COLUMN_NAME      | DATA_TYPE  | IS_NULLABLE  |
>>>>>> +-----------------------+------------+--------------+
>>>>>> | costcenter_id         | INTEGER    | YES          |
>>>>>> | costcenter_desc       | VARCHAR    | YES          |
>>>>>> | costcenter_name_id    | INTEGER    | YES          |
>>>>>> | costcenter_name_desc  | VARCHAR    | YES          |
>>>>>> | department_id         | INTEGER    | YES          |
>>>>>> | division_id           | INTEGER    | YES          |
>>>>>> | area_id               | INTEGER    | YES          |
>>>>>> +-----------------------+------------+--------------+
>>>>>> 
>>>>>> Both tables have 582 rows.
>>>>>> 
>>>>>> So I need to join these two tables and run some aggregations on them
>> in
>>>>>> order to create a report. I have the following query:
>>>>>> 
>>>>>> select a11.costcenter_id as costcenter_id, max(a12.costcenter_desc) as
>>>>>> costcenter_desc, max(a12.costcenter_name_desc) as
>> costcenter_name_desc,
>>>>>> sum(a11.account_value) as sss from dfs.views_costcenter.fact_account
>> a11
>>>>>> join dfs.views_costcenter.dim_costcenter a12 on (a11.costcenter_id =
>>>>>> a12.costcenter_id) group by a11.costcenter_id;
>>>>>> 
>>>>>> When I run this query, the execution planner throws a huge exception
>> as
>>>> you
>>>>>> can see below. However, I've found a strange solution to that. If I
>>>>>> exchange the order of the tables within the join, ie.
>>>>>> 
>>>>>> select a11.costcenter_id as costcenter_id, max(a12.costcenter_desc) as
>>>>>> costcenter_desc, max(a12.costcenter_name_desc) as
>> costcenter_name_desc,
>>>>>> sum(a11.account_value) as sss from dfs.views_costcenter.dim_costcenter
>>>> a12
>>>>>> join dfs.views_costcenter.fact_account a11 on (a11.costcenter_id =
>>>>>> a12.costcenter_id) group by a11.costcenter_id;
>>>>>> 
>>>>>> It works perfectly. So in summary, if I write t2 join t1 instead of t1
>>>> join
>>>>>> t2 and change nothing else, it works like a charm. As inner join is
>>>>>> commutative and associative, this was completely unexpected for me.
>> Can
>>>>>> someone confirm if this is a bug? I didn't want to file a bug to JIRA
>>>>>> before asking you guys here first.
>>>>>> 
>>>>>> Thanks in advance for your help.
>>>>>> 
>>>>>> Below you can find the exception: (as the exception is huge, I've only
>>>>>> posted part of it here. Please let me know if you need the complete
>>>>>> exception)
>>>>>> 
>>>>>> 
>>>>>> Error: SYSTEM ERROR:
>>>>>> org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node
>>>>>> [rel#15146:Subset#27.PHYSICAL.SINGLETON([]).[]] could not be
>>>> implemented;
>>>>>> planner state:
>>>>>> 
>>>>>> Root: rel#15146:Subset#27.PHYSICAL.SINGLETON([]).[]
>>>>>> Original rel:
>>>>>> 
>>>> 
>> AbstractConverter(subset=[rel#15146:Subset#27.PHYSICAL.SINGLETON([]).[]],
>>>>>> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])],
>>>>>> sort=[[]]): rowcount = 101.8, cumulative cost = {inf}, id = 15148
>>>>>> DrillScreenRel(subset=[rel#15145:Subset#27.LOGICAL.ANY([]).[]]):
>>>> rowcount
>>>>>> = 101.8, cumulative cost = {10.18 rows, 10.18 cpu, 0.0 io, 0.0
>> network,
>>>> 0.0
>>>>>> memory}, id = 15144
>>>>>>  DrillAggregateRel(subset=[rel#15143:Subset#26.LOGICAL.ANY([]).[]],
>>>>>> group=[{0}], costcenter_desc=[MAX($1)],
>> costcenter_name_desc=[MAX($2)],
>>>>>> sss=[SUM($3)]): rowcount = 101.8, cumulative cost = {1.0 rows, 1.0
>> cpu,
>>>> 0.0
>>>>>> io, 0.0 network, 0.0 memory}, id = 15142
>>>>>>    DrillProjectRel(subset=[rel#15141:Subset#25.LOGICAL.ANY([]).[]],
>>>>>> costcenter_id=[$1], costcenter_desc=[$5], costcenter_name_desc=[$7],
>>>>>> account_value=[$3]): rowcount = 1018.0, cumulative cost = {0.0 rows,
>> 0.0
>>>>>> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15140
>>>>>>      DrillProjectRel(subset=[rel#15139:Subset#24.LOGICAL.ANY([]).[0,
>>>> 2,
>>>>>> 4, 5, 6]], account_id=[$7], costcenter_id=[$8], account_date=[$9],
>>>>>> account_value=[$10], costcenter_id0=[$0], costcenter_desc=[$1],
>>>>>> costcenter_name_id=[$2], costcenter_name_desc=[$3],
>> department_id=[$4],
>>>>>> division_id=[$5], area_id=[$6]): rowcount = 1018.0, cumulative cost =
>>>> {0.0
>>>>>> rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15138
>>>>>>        DrillJoinRel(subset=[rel#15137:Subset#23.LOGICAL.ANY([]).[0,
>> 2,
>>>>>> 4, 5, 6]], condition=[=($8, $0)], joinType=[inner]): rowcount =
>> 1018.0,
>>>>>> cumulative cost = {1119.8 rows, 13030.4 cpu, 0.0 io, 0.0 network,
>>>> 1791.68
>>>>>> memory}, id = 15136
>>>>>>          DrillSortRel(subset=[rel#15128:Subset#18.LOGICAL.ANY([]).[0,
>>>> 2,
>>>>>> 4, 5, 6]], sort0=[$0], sort1=[$2], sort2=[$4], sort3=[$5], sort4=[$6],
>>>>>> dir0=[ASC], dir1=[ASC], dir2=[ASC], dir3=[ASC], dir4=[ASC]): rowcount
>> =
>>>>>> 1018.0, cumulative cost = {197407.16549843678 rows, 1018.0 cpu, 0.0
>> io,
>>>> 0.0
>>>>>> network, 0.0 memory}, id = 15127
>>>>>> 
>>>>>> DrillProjectRel(subset=[rel#15126:Subset#17.LOGICAL.ANY([]).[]],
>>>>>> costcenter_id=[CAST(ITEM($0, 1)):INTEGER],
>>>> costcenter_desc=[CAST(ITEM($0,
>>>>>> 12)):VARCHAR(45) CHARACTER SET "ISO-8859-1" COLLATE
>>>>>> "ISO-8859-1$en_US$primary"], costcenter_name_id=[CAST(ITEM($0,
>>>>>> 11)):INTEGER], costcenter_name_desc=[CAST(ITEM($0, 10)):VARCHAR(45)
>>>>>> CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"],
>>>>>> department_id=[CAST(ITEM($0, 9)):INTEGER], division_id=[CAST(ITEM($0,
>>>>>> 7)):INTEGER], area_id=[CAST(ITEM($0, 5)):INTEGER]): rowcount = 1018.0,
>>>>>> cumulative cost = {1018.0 rows, 28504.0 cpu, 0.0 io, 0.0 network, 0.0
>>>>>> memory}, id = 15125
>>>>>> 
>>>>>> DrillScanRel(subset=[rel#15124:Subset#16.LOGICAL.ANY([]).[]],
>>>> table=[[dfs,
>>>>>> raw_costcenter, master_datev*.csv]], groupscan=[EasyGroupScan
>>>>>> [selectionRoot=/raw/costcenter/master_datev*.csv, numFiles=1,
>>>>>> columns=[`columns`[1], `columns`[12], `columns`[11], `columns`[10],
>>>>>> `columns`[9], `columns`[7], `columns`[5]],
>>>>>> files=[maprfs:/raw/costcenter/master_datev_20150617.csv]]]): rowcount
>> =
>>>>>> 1018.0, cumulative cost = {1018.0 rows, 1018.0 cpu, 0.0 io, 0.0
>> network,
>>>>>> 0.0 memory}, id = 15064
>>>>>>          DrillSortRel(subset=[rel#15135:Subset#22.LOGICAL.ANY([]).[0,
>>>> 1,
>>>>>> 2]], sort0=[$0], sort1=[$1], sort2=[$2], dir0=[ASC], dir1=[ASC],
>>>>>> dir2=[ASC]): rowcount = 101.8, cumulative cost = {7529.958857584828
>>>> rows,
>>>>>> 101.8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15134
>>>>>> 
>>>>>> DrillAggregateRel(subset=[rel#15133:Subset#21.LOGICAL.ANY([]).[]],
>>>>>> group=[{0, 1, 2, 3}]): rowcount = 101.8, cumulative cost = {1.0 rows,
>>>> 1.0
>>>>>> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 15132
>>>>>> 
>>>>>> DrillProjectRel(subset=[rel#15131:Subset#20.LOGICAL.ANY([]).[]],
>>>>>> account_id=[CAST(ITEM($0, 0)):INTEGER], costcenter_id=[CAST(ITEM($0,
>>>>>> 1)):INTEGER], account_date=[CAST(ITEM($0, 2)):DATE],
>>>>>> account_value=[CAST(ITEM($0, 3)):INTEGER]): rowcount = 1018.0,
>>>> cumulative
>>>>>> cost = {1018.0 rows, 16288.0 cpu, 0.0 io, 0.0 network, 0.0 memory},
>> id =
>>>>>> 15130
>>>>>> 
>>>>>> DrillScanRel(subset=[rel#15129:Subset#19.LOGICAL.ANY([]).[]],
>>>> table=[[dfs,
>>>>>> raw_costcenter, master_datev*.csv]], groupscan=[EasyGroupScan
>>>>>> [selectionRoot=/raw/costcenter/master_datev*.csv, numFiles=1,
>>>>>> columns=[`columns`[0], `columns`[1], `columns`[2], `columns`[3]],
>>>>>> files=[maprfs:/raw/costcenter/master_datev_20150617.csv]]]): rowcount
>> =
>>>>>> 1018.0, cumulative cost = {1018.0 rows, 1018.0 cpu, 0.0 io, 0.0
>> network,
>>>>>> 0.0 memory}, id = 15074
>>>>>> 
>>>>>> Sets:
>>>>>> Set#16, type: RecordType(ANY columns)
>>>>>> rel#15124:Subset#16.LOGICAL.ANY([]).[], best=rel#15064,
>>>>>> importance=0.4304672100000001
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> *M. Engin Sözer*
>>>>>> Junior Datawarehouse Manager
>>>>>> [email protected]
>>>>>> 
>>>>>> Goodgame Studios
>>>>>> Theodorstr. 42-90, House 9
>>>>>> 22761 Hamburg, Germany
>>>>>> Phone: +49 (0)40 219 880 -0
>>>>>> *www.goodgamestudios.com <http://www.goodgamestudios.com>*
>>>>>> 
>>>>>> Goodgame Studios is a branch of Altigi GmbH
>>>>>> Altigi GmbH, District court Hamburg, HRB 99869
>>>>>> Board of directors: Dr. Kai Wawrzinek, Dr. Christian Wawrzinek, Fabian
>>>>>> Ritter
>>>>>> 
>>>> 
>>>> 
>> 
>> 
> 
> 
> -- 
> 
> *M. Engin Sözer*
> Junior Datawarehouse Manager
> [email protected]
> 
> Goodgame Studios
> Theodorstr. 42-90, House 9
> 22761 Hamburg, Germany
> Phone: +49 (0)40 219 880 -0
> *www.goodgamestudios.com <http://www.goodgamestudios.com>*
> 
> Goodgame Studios is a branch of Altigi GmbH
> Altigi GmbH, District court Hamburg, HRB 99869
> Board of directors: Dr. Kai Wawrzinek, Dr. Christian Wawrzinek, Fabian
> Ritter

Re: Exception during Execution Planning for joins

Reply via email to