1.kylin 的版本为2.3.1
2.通过看日志发现一个问题,
日常 我们对于维度表,不同的cube是公用一个维度表的,分析了一下,不知道下面的推断是否正确?
1.场景:2个cube 同时用的相同的一个cube,这个时候 hive中的维度表数据发生了变化,kylin中的一个cube 进行了新数据的build,
但是另外一个cube 没有进行build,这个时候 使用kylin 查询维度表,数据
会发生错乱?
2.针对我们生产环境现象是
模型:ChargeBillActiveProtectAnalysisCube 和模型:CHARGEBILL_V3 同时用了 维度表
:电站,但是CHARGEBILL_V3 基于新的业务数据进行了build,但是ChargeBillActiveProtectAnalysisCube
没有进行build。
select * from V_KMSSTATION_V3 where name like '山东%' ---这个sql 从日志看 使用了
cube:CHARGEBILL_V3 --得到了新的结果
select name from V_KMSSTATION_V3 where name like '山东%' --这个sql 从日志看
使用了cube:ChargeBillActiveProtectAnalysisCube --得到了旧的数据
疑问:这种公用维度表的cube,单独使用kylin sql 进行查询时, 使用哪一个cube 是如何判断的呢?不能公用维度表了? 应该如何处理这种场景呢?
相关的明细日志如下:
--执行sql:select * from V_KMSSTATION_V3 where name like '山东%' --查询的结果是新数据
执行日志如下:
2018-10-11 11:10:40,589 DEBUG [Query
472468f6-303d-4086-addf-9145cf0985a8-679403] util.CheckUtil:35 : query cache
disabled in KylinConfig
2018-10-11 11:10:40,589 INFO [Query
472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:428 : Using
project: TELD
2018-10-11 11:10:40,589 INFO [Query
472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:429 : The
original query: select * from V_KMSSTATION_V3 where name like '山东%'
2018-10-11 11:10:40,591 INFO [Query
472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:646 : The
corrected query: select * from V_KMSSTATION_V3 where name like '山东%'
LIMIT 50000
2018-10-11 11:10:40,598 INFO [Query
472468f6-303d-4086-addf-9145cf0985a8-679403] routing.QueryRouter:58 : Find
candidates by table DEFAULT.V_KMSSTATION_V3 and project=TELD :
CUBE[name=CHARGEBILL_V3]
2018-10-11 11:10:40,598 INFO [Query
472468f6-303d-4086-addf-9145cf0985a8-679403] routing.QueryRouter:51 : Applying
rule: class
org.apache.kylin.query.routing.rules.RemoveBlackoutRealizationsRule,
realizations before: [CUBE[name=CHARGEBILL_V3]], realizations after:
[CUBE[name=CHARGEBILL_V3]]
2018-10-11 11:10:40,599 INFO [Query
472468f6-303d-4086-addf-9145cf0985a8-679403] routing.QueryRouter:51 : Applying
rule: class
org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule,
realizations before: [CUBE[name=CHARGEBILL_V3]], realizations after:
[CUBE[name=CHARGEBILL_V3]]
2018-10-11 11:10:40,599 INFO [Query
472468f6-303d-4086-addf-9145cf0985a8-679403] rules.RealizationSortRule:40 :
CUBE[name=CHARGEBILL_V3] priority 1 cost 1447.
2018-10-11 11:10:40,599 INFO [Query
472468f6-303d-4086-addf-9145cf0985a8-679403] routing.QueryRouter:51 : Applying
rule: class org.apache.kylin.query.routing.rules.RealizationSortRule,
realizations before: [CUBE[name=CHARGEBILL_V3]], realizations after:
[CUBE[name=CHARGEBILL_V3]]
2018-10-11 11:10:40,599 INFO [Query
472468f6-303d-4086-addf-9145cf0985a8-679403] routing.QueryRouter:75 : The
realizations remaining: [CUBE[name=CHARGEBILL_V3]],and the final chosen one for
current olap context 0 is CUBE[name=CHARGEBILL_V3]
2018-10-11 11:10:41,311 INFO [Scheduler 1381016263 Job
cd97b823-4d7f-44cd-a156-6d88d63794d8-445] hive.CreateFlatHiveTableStep:38 : Map
1: 0/42 Map 11: 0/2 Map 12: 0/1 Map 13: 0/1 Map 14: 0/1 Map 15:
0/1 Map 16: 0/1 Map 17: 0/1 Map 18: 0/1 Map 19: 0/1 Map 20:
0/1 Map 21: 0/1 Map 22: 0/1 Map 23: 0/1 Map 24: 0/1 Map 3:
0/8 Map 4: 0/1 Map 5: 0/5 Map 7: 0/5 Map 9: 0/5
Reducer 10: 0/2 Reducer 6: 0/2 Reducer 8: 0/2
2018-10-11 11:10:41,535 INFO [Query
472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:1068 :
Processed rows for each storageContext: 0
2018-10-11 11:10:41,536 INFO [Query
472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:510 : Stats
of SQL response: isException: false, duration: 948, total scan count 0
2018-10-11 11:10:41,536 DEBUG [Query
472468f6-303d-4086-addf-9145cf0985a8-679403] util.CheckUtil:35 : query cache is
disabled
2018-10-11 11:10:41,536 INFO [Query
472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:328 :
==========================[QUERY]===============================
Query Id: 472468f6-303d-4086-addf-9145cf0985a8
SQL: select * from V_KMSSTATION_V3 where name like '山东%'
User: ADMIN
Success: true
Duration: 0.948
Project: TELD
Realization Names: [CUBE[name=CHARGEBILL_V3]]
Cuboid Ids: []
Total scan count: 0
Total scan bytes: 0
Result row count: 1
Accept Partial: true
Is Partial Result: false
Hit Exception Cache: false
Storage cache used: false
Is Query Push-Down: false
Is Prepare: false
Trace URL: null
Message: null
==========================[QUERY]===============================
第2个sql:
-执行sql:select name from V_KMSSTATION_V3 where name like '山东%' --查询的结果是旧数据
--执行日志如下(奇怪的是这2个查询 日志上显示的cube不一样,
我们在定义时 ChargeBillActiveProtectAnalysisCube 和CHARGEBILL_V3 是公用的)
2018-10-11 11:13:48,547 DEBUG [Query
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] util.CheckUtil:35 : query cache
disabled in KylinConfig
2018-10-11 11:13:48,547 INFO [Query
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:428 : Using
project: TELD
2018-10-11 11:13:48,547 INFO [Query
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:429 : The
original query: select name from V_KMSSTATION_V3 where name like '山东%'
2018-10-11 11:13:48,549 INFO [Query
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:646 : The
corrected query: select name from V_KMSSTATION_V3 where name like '山东%'
LIMIT 50000
2018-10-11 11:13:48,556 INFO [Query
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] routing.QueryRouter:58 : Find
candidates by table DEFAULT.V_KMSSTATION_V3 and project=TELD :
CUBE[name=ChargeBillActiveProtectAnalysisCube]
2018-10-11 11:13:48,556 INFO [Query
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] routing.QueryRouter:51 : Applying
rule: class
org.apache.kylin.query.routing.rules.RemoveBlackoutRealizationsRule,
realizations before: [CUBE[name=ChargeBillActiveProtectAnalysisCube]],
realizations after: [CUBE[name=ChargeBillActiveProtectAnalysisCube]]
2018-10-11 11:13:48,557 INFO [Query
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] routing.QueryRouter:51 : Applying
rule: class
org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule,
realizations before: [CUBE[name=ChargeBillActiveProtectAnalysisCube]],
realizations after: [CUBE[name=ChargeBillActiveProtectAnalysisCube]]
2018-10-11 11:13:48,557 INFO [Query
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] rules.RealizationSortRule:40 :
CUBE[name=ChargeBillActiveProtectAnalysisCube] priority 1 cost 422.
2018-10-11 11:13:48,557 INFO [Query
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] routing.QueryRouter:51 : Applying
rule: class org.apache.kylin.query.routing.rules.RealizationSortRule,
realizations before: [CUBE[name=ChargeBillActiveProtectAnalysisCube]],
realizations after: [CUBE[name=ChargeBillActiveProtectAnalysisCube]]
2018-10-11 11:13:48,557 INFO [Query
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] routing.QueryRouter:75 : The
realizations remaining: [CUBE[name=ChargeBillActiveProtectAnalysisCube]],and
the final chosen one for current olap context 0 is
CUBE[name=ChargeBillActiveProtectAnalysisCube]
2018-10-11 11:13:49,288 INFO [Scheduler 1381016263 Job
0ad7c900-c8d9-4af5-bac0-58db06af4a67-378] hive.CreateFlatHiveTableStep:38 : Map
1: 30(+41)/96 Map 11: 2/2 Map 12: 1/1 Map 13: 1/1 Map 14: 1/1
Map 15: 1/1 Map 16: 1/1 Map 17: 1/1 Map 18: 1/1 Map 19: 1/1
Map 20: 1/1 Map 21: 1/1 Map 22: 1/1 Map 23: 1/1 Map 24: 1/1
Map 3: 72(+2)/74 Map 4: 1/1 Map 5: 5/5 Map 7: 5/5 Map 9:
5/5 Reducer 10: 2/2 Reducer 6: 2/2 Reducer 8: 2/2
2018-10-11 11:13:49,557 INFO [Scheduler 1381016263 Job
0ad7c900-c8d9-4af5-bac0-58db06af4a67-378] hive.CreateFlatHiveTableStep:38 : Map
1: 30(+42)/96 Map 11: 2/2 Map 12: 1/1 Map 13: 1/1 Map 14: 1/1
Map 15: 1/1 Map 16: 1/1 Map 17: 1/1 Map 18: 1/1 Map 19: 1/1
Map 20: 1/1 Map 21: 1/1 Map 22: 1/1 Map 23: 1/1 Map 24: 1/1
Map 3: 72(+2)/74 Map 4: 1/1 Map 5: 5/5 Map 7: 5/5 Map 9:
5/5 Reducer 10: 2/2 Reducer 6: 2/2 Reducer 8: 2/2
2018-10-11 11:13:49,597 INFO [Query
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:1068 :
Processed rows for each storageContext: 0
2018-10-11 11:13:49,597 INFO [Query
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:510 : Stats
of SQL response: isException: false, duration: 1050, total scan count 0
2018-10-11 11:13:49,597 DEBUG [Query
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] util.CheckUtil:35 : query cache is
disabled
2018-10-11 11:13:49,598 INFO [Query
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:328 :
==========================[QUERY]===============================
Query Id: 32b23403-77d4-40d5-9743-ce2b7f31628a
SQL: select name from V_KMSSTATION_V3 where name like '山东%'
User: ADMIN
Success: true
Duration: 1.051
Project: TELD
Realization Names: [CUBE[name=ChargeBillActiveProtectAnalysisCube]]
Cuboid Ids: []
Total scan count: 0
Total scan bytes: 0
Result row count: 1
Accept Partial: true
Is Partial Result: false
Hit Exception Cache: false
Storage cache used: false
Is Query Push-Down: false
Is Prepare: false
Trace URL: null
Message: null
==========================[QUERY]===============================
------------------------------------------------------------------
发件人:Na Zhai <[email protected]>
发送时间:2018年10月11日(星期四) 17:38
收件人:[email protected] <[email protected]>; 潘博存 <[email protected]>
主 题:答复: 维度表数据发生变化后,sql查询出来的数据还是历史数据
Hi, bocun
What's your Kylin version? Can you provide the log information?
发送自 Windows 10 版邮件应用
发件人: 潘博存 <[email protected]>
发送时间: Thursday, October 11, 2018 3:16:50 PM
收件人: user
抄送: 冯志山
主题: 维度表数据发生变化后,sql查询出来的数据还是历史数据
今天,线上环境,发现一个问题,感觉很奇怪,详请教一下具体原因,以及如何避免以下现象的出现,感谢!
1.基于事实表A和维度表B创建了Cube c,其中 维度表B的字段有 id,code,name. 事实表A通过 字段BID 与B中的id字段进行关联
--B的name 字段是衍生列
2.可以使用kylin正常构件,每天正常构件。
维度表有这样一条记录
id code name
001 001 山东大厦
3.一段时间之后,维度表中的上述数据发生了变化。
id code name
001 001 山东新闻大厦
这个时候再次进行构件
4.在kylin中进行查询(kylin 中会生成事实表和维度表)
select * from B --查询结果
id code name
001 001 山东新闻大厦
select name from B --查询结果
name
山东大厦
问题:
为什么通过 select name from B 出来的数据是维度发生变化之前的数据呢,但是通过select * from B 出来的数据 是新数据