是的,维度表应该是公共的,使用构件的规则来避免此类问题的发生
------------------------------------------------------------------ 发件人:xixin <[email protected]> 发送时间:2018年10月12日(星期五) 16:17 收件人:user <[email protected]>; 潘博存 <[email protected]> 主 题:Re: 维度表数据发生变化后,sql查询出来的数据还是历史数据 Hi 你的问题总结来说就是:两个数据不一致的Cube同时「满足」了sql的查询,不同Cube给出的结果自然不同。 目前应该是这样:Kylin对于一个sql,如何选择Cube(选择哪个Cube)对用户应该是透明的。实际使用中应该注意这种情况的出现。 个人理解仅供参考 Best regards, Zhao Xixin 在 2018年10月11日,19:28,潘博存 <[email protected]> 写道: 1.kylin 的版本为2.3.1 2.通过看日志发现一个问题, 日常 我们对于维度表,不同的cube是公用一个维度表的,分析了一下,不知道下面的推断是否正确? 1.场景:2个cube 同时用的相同的一个cube,这个时候 hive中的维度表数据发生了变化,kylin中的一个cube 进行了新数据的build, 但是另外一个cube 没有进行build,这个时候 使用kylin 查询维度表,数据 会发生错乱? 2.针对我们生产环境现象是 模型:ChargeBillActiveProtectAnalysisCube 和模型:CHARGEBILL_V3 同时用了 维度表 :电站,但是CHARGEBILL_V3 基于新的业务数据进行了build,但是ChargeBillActiveProtectAnalysisCube 没有进行build。 select * from V_KMSSTATION_V3 where name like '山东%' ---这个sql 从日志看 使用了 cube:CHARGEBILL_V3 --得到了新的结果 select name from V_KMSSTATION_V3 where name like '山东%' --这个sql 从日志看 使用了cube:ChargeBillActiveProtectAnalysisCube --得到了旧的数据 疑问:这种公用维度表的cube,单独使用kylin sql 进行查询时, 使用哪一个cube 是如何判断的呢?不能公用维度表了? 应该如何处理这种场景呢? 相关的明细日志如下: --执行sql:select * from V_KMSSTATION_V3 where name like '山东%' --查询的结果是新数据 执行日志如下: 2018-10-11 11:10:40,589 DEBUG [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] util.CheckUtil:35 : query cache disabled in KylinConfig 2018-10-11 11:10:40,589 INFO [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:428 : Using project: TELD 2018-10-11 11:10:40,589 INFO [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:429 : The original query: select * from V_KMSSTATION_V3 where name like '山东%' 2018-10-11 11:10:40,591 INFO [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:646 : The corrected query: select * from V_KMSSTATION_V3 where name like '山东%' LIMIT 50000 2018-10-11 11:10:40,598 INFO [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] routing.QueryRouter:58 : Find candidates by table DEFAULT.V_KMSSTATION_V3 and project=TELD : CUBE[name=CHARGEBILL_V3] 2018-10-11 11:10:40,598 INFO [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] routing.QueryRouter:51 : Applying rule: class org.apache.kylin.query.routing.rules.RemoveBlackoutRealizationsRule, realizations before: [CUBE[name=CHARGEBILL_V3]], realizations after: [CUBE[name=CHARGEBILL_V3]] 2018-10-11 11:10:40,599 INFO [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] routing.QueryRouter:51 : Applying rule: class org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule, realizations before: [CUBE[name=CHARGEBILL_V3]], realizations after: [CUBE[name=CHARGEBILL_V3]] 2018-10-11 11:10:40,599 INFO [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] rules.RealizationSortRule:40 : CUBE[name=CHARGEBILL_V3] priority 1 cost 1447. 2018-10-11 11:10:40,599 INFO [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] routing.QueryRouter:51 : Applying rule: class org.apache.kylin.query.routing.rules.RealizationSortRule, realizations before: [CUBE[name=CHARGEBILL_V3]], realizations after: [CUBE[name=CHARGEBILL_V3]] 2018-10-11 11:10:40,599 INFO [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] routing.QueryRouter:75 : The realizations remaining: [CUBE[name=CHARGEBILL_V3]],and the final chosen one for current olap context 0 is CUBE[name=CHARGEBILL_V3] 2018-10-11 11:10:41,311 INFO [Scheduler 1381016263 Job cd97b823-4d7f-44cd-a156-6d88d63794d8-445] hive.CreateFlatHiveTableStep:38 : Map 1: 0/42 Map 11: 0/2 Map 12: 0/1 Map 13: 0/1 Map 14: 0/1 Map 15: 0/1 Map 16: 0/1 Map 17: 0/1 Map 18: 0/1 Map 19: 0/1 Map 20: 0/1 Map 21: 0/1 Map 22: 0/1 Map 23: 0/1 Map 24: 0/1 Map 3: 0/8 Map 4: 0/1 Map 5: 0/5 Map 7: 0/5 Map 9: 0/5 Reducer 10: 0/2 Reducer 6: 0/2 Reducer 8: 0/2 2018-10-11 11:10:41,535 INFO [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:1068 : Processed rows for each storageContext: 0 2018-10-11 11:10:41,536 INFO [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:510 : Stats of SQL response: isException: false, duration: 948, total scan count 0 2018-10-11 11:10:41,536 DEBUG [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] util.CheckUtil:35 : query cache is disabled 2018-10-11 11:10:41,536 INFO [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:328 : ==========================[QUERY]=============================== Query Id: 472468f6-303d-4086-addf-9145cf0985a8 SQL: select * from V_KMSSTATION_V3 where name like '山东%' User: ADMIN Success: true Duration: 0.948 Project: TELD Realization Names: [CUBE[name=CHARGEBILL_V3]] Cuboid Ids: [] Total scan count: 0 Total scan bytes: 0 Result row count: 1 Accept Partial: true Is Partial Result: false Hit Exception Cache: false Storage cache used: false Is Query Push-Down: false Is Prepare: false Trace URL: null Message: null ==========================[QUERY]=============================== 第2个sql: -执行sql:select name from V_KMSSTATION_V3 where name like '山东%' --查询的结果是旧数据 --执行日志如下(奇怪的是这2个查询 日志上显示的cube不一样, 我们在定义时 ChargeBillActiveProtectAnalysisCube 和CHARGEBILL_V3 是公用的) 2018-10-11 11:13:48,547 DEBUG [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] util.CheckUtil:35 : query cache disabled in KylinConfig 2018-10-11 11:13:48,547 INFO [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:428 : Using project: TELD 2018-10-11 11:13:48,547 INFO [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:429 : The original query: select name from V_KMSSTATION_V3 where name like '山东%' 2018-10-11 11:13:48,549 INFO [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:646 : The corrected query: select name from V_KMSSTATION_V3 where name like '山东%' LIMIT 50000 2018-10-11 11:13:48,556 INFO [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] routing.QueryRouter:58 : Find candidates by table DEFAULT.V_KMSSTATION_V3 and project=TELD : CUBE[name=ChargeBillActiveProtectAnalysisCube] 2018-10-11 11:13:48,556 INFO [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] routing.QueryRouter:51 : Applying rule: class org.apache.kylin.query.routing.rules.RemoveBlackoutRealizationsRule, realizations before: [CUBE[name=ChargeBillActiveProtectAnalysisCube]], realizations after: [CUBE[name=ChargeBillActiveProtectAnalysisCube]] 2018-10-11 11:13:48,557 INFO [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] routing.QueryRouter:51 : Applying rule: class org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule, realizations before: [CUBE[name=ChargeBillActiveProtectAnalysisCube]], realizations after: [CUBE[name=ChargeBillActiveProtectAnalysisCube]] 2018-10-11 11:13:48,557 INFO [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] rules.RealizationSortRule:40 : CUBE[name=ChargeBillActiveProtectAnalysisCube] priority 1 cost 422. 2018-10-11 11:13:48,557 INFO [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] routing.QueryRouter:51 : Applying rule: class org.apache.kylin.query.routing.rules.RealizationSortRule, realizations before: [CUBE[name=ChargeBillActiveProtectAnalysisCube]], realizations after: [CUBE[name=ChargeBillActiveProtectAnalysisCube]] 2018-10-11 11:13:48,557 INFO [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] routing.QueryRouter:75 : The realizations remaining: [CUBE[name=ChargeBillActiveProtectAnalysisCube]],and the final chosen one for current olap context 0 is CUBE[name=ChargeBillActiveProtectAnalysisCube] 2018-10-11 11:13:49,288 INFO [Scheduler 1381016263 Job 0ad7c900-c8d9-4af5-bac0-58db06af4a67-378] hive.CreateFlatHiveTableStep:38 : Map 1: 30(+41)/96 Map 11: 2/2 Map 12: 1/1 Map 13: 1/1 Map 14: 1/1 Map 15: 1/1 Map 16: 1/1 Map 17: 1/1 Map 18: 1/1 Map 19: 1/1 Map 20: 1/1 Map 21: 1/1 Map 22: 1/1 Map 23: 1/1 Map 24: 1/1 Map 3: 72(+2)/74 Map 4: 1/1 Map 5: 5/5 Map 7: 5/5 Map 9: 5/5 Reducer 10: 2/2 Reducer 6: 2/2 Reducer 8: 2/2 2018-10-11 11:13:49,557 INFO [Scheduler 1381016263 Job 0ad7c900-c8d9-4af5-bac0-58db06af4a67-378] hive.CreateFlatHiveTableStep:38 : Map 1: 30(+42)/96 Map 11: 2/2 Map 12: 1/1 Map 13: 1/1 Map 14: 1/1 Map 15: 1/1 Map 16: 1/1 Map 17: 1/1 Map 18: 1/1 Map 19: 1/1 Map 20: 1/1 Map 21: 1/1 Map 22: 1/1 Map 23: 1/1 Map 24: 1/1 Map 3: 72(+2)/74 Map 4: 1/1 Map 5: 5/5 Map 7: 5/5 Map 9: 5/5 Reducer 10: 2/2 Reducer 6: 2/2 Reducer 8: 2/2 2018-10-11 11:13:49,597 INFO [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:1068 : Processed rows for each storageContext: 0 2018-10-11 11:13:49,597 INFO [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:510 : Stats of SQL response: isException: false, duration: 1050, total scan count 0 2018-10-11 11:13:49,597 DEBUG [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] util.CheckUtil:35 : query cache is disabled 2018-10-11 11:13:49,598 INFO [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:328 : ==========================[QUERY]=============================== Query Id: 32b23403-77d4-40d5-9743-ce2b7f31628a SQL: select name from V_KMSSTATION_V3 where name like '山东%' User: ADMIN Success: true Duration: 1.051 Project: TELD Realization Names: [CUBE[name=ChargeBillActiveProtectAnalysisCube]] Cuboid Ids: [] Total scan count: 0 Total scan bytes: 0 Result row count: 1 Accept Partial: true Is Partial Result: false Hit Exception Cache: false Storage cache used: false Is Query Push-Down: false Is Prepare: false Trace URL: null Message: null ==========================[QUERY]=============================== ------------------------------------------------------------------ 发件人:Na Zhai <[email protected]> 发送时间:2018年10月11日(星期四) 17:38 收件人:[email protected] <[email protected]>; 潘博存 <[email protected]> 主 题:答复: 维度表数据发生变化后,sql查询出来的数据还是历史数据 Hi, bocun What's your Kylin version? Can you provide the log information? 发送自 Windows 10 版邮件应用 发件人: 潘博存 <[email protected]> 发送时间: Thursday, October 11, 2018 3:16:50 PM 收件人: user 抄送: 冯志山 主题: 维度表数据发生变化后,sql查询出来的数据还是历史数据 今天,线上环境,发现一个问题,感觉很奇怪,详请教一下具体原因,以及如何避免以下现象的出现,感谢! 1.基于事实表A和维度表B创建了Cube c,其中 维度表B的字段有 id,code,name. 事实表A通过 字段BID 与B中的id字段进行关联 --B的name 字段是衍生列 2.可以使用kylin正常构件,每天正常构件。 维度表有这样一条记录 id code name 001 001 山东大厦 3.一段时间之后,维度表中的上述数据发生了变化。 id code name 001 001 山东新闻大厦 这个时候再次进行构件 4.在kylin中进行查询(kylin 中会生成事实表和维度表) select * from B --查询结果 id code name 001 001 山东新闻大厦 select name from B --查询结果 name 山东大厦 问题: 为什么通过 select name from B 出来的数据是维度发生变化之前的数据呢,但是通过select * from B 出来的数据 是新数据
