1.kylin 的版本为2.3.1
   2.通过看日志发现一个问题,
     日常 我们对于维度表,不同的cube是公用一个维度表的,分析了一下,不知道下面的推断是否正确?

    1.场景:2个cube 同时用的相同的一个cube,这个时候 hive中的维度表数据发生了变化,kylin中的一个cube 进行了新数据的build, 
但是另外一个cube 没有进行build,这个时候 使用kylin 查询维度表,数据
      会发生错乱?
    2.针对我们生产环境现象是
       模型:ChargeBillActiveProtectAnalysisCube 和模型:CHARGEBILL_V3  同时用了 维度表 
:电站,但是CHARGEBILL_V3 基于新的业务数据进行了build,但是ChargeBillActiveProtectAnalysisCube 
没有进行build。

      select * from V_KMSSTATION_V3 where name like '山东%'    ---这个sql 从日志看 使用了 
cube:CHARGEBILL_V3   --得到了新的结果
      select name from V_KMSSTATION_V3 where name like '山东%'  --这个sql 从日志看 
使用了cube:ChargeBillActiveProtectAnalysisCube  --得到了旧的数据

  疑问:这种公用维度表的cube,单独使用kylin sql 进行查询时, 使用哪一个cube 是如何判断的呢?不能公用维度表了? 应该如何处理这种场景呢?

     相关的明细日志如下:
      --执行sql:select * from V_KMSSTATION_V3 where name like '山东%'  --查询的结果是新数据
 执行日志如下:
2018-10-11 11:10:40,589 DEBUG [Query 
472468f6-303d-4086-addf-9145cf0985a8-679403] util.CheckUtil:35 : query cache 
disabled in KylinConfig
2018-10-11 11:10:40,589 INFO  [Query 
472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:428 : Using 
project: TELD
2018-10-11 11:10:40,589 INFO  [Query 
472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:429 : The 
original query:  select * from V_KMSSTATION_V3 where name like '山东%'
2018-10-11 11:10:40,591 INFO  [Query 
472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:646 : The 
corrected query: select * from V_KMSSTATION_V3 where name like '山东%'
LIMIT 50000
2018-10-11 11:10:40,598 INFO  [Query 
472468f6-303d-4086-addf-9145cf0985a8-679403] routing.QueryRouter:58 : Find 
candidates by table DEFAULT.V_KMSSTATION_V3 and project=TELD : 
CUBE[name=CHARGEBILL_V3]
2018-10-11 11:10:40,598 INFO  [Query 
472468f6-303d-4086-addf-9145cf0985a8-679403] routing.QueryRouter:51 : Applying 
rule: class 
org.apache.kylin.query.routing.rules.RemoveBlackoutRealizationsRule, 
realizations before: [CUBE[name=CHARGEBILL_V3]], realizations after: 
[CUBE[name=CHARGEBILL_V3]]
2018-10-11 11:10:40,599 INFO  [Query 
472468f6-303d-4086-addf-9145cf0985a8-679403] routing.QueryRouter:51 : Applying 
rule: class 
org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule, 
realizations before: [CUBE[name=CHARGEBILL_V3]], realizations after: 
[CUBE[name=CHARGEBILL_V3]]
2018-10-11 11:10:40,599 INFO  [Query 
472468f6-303d-4086-addf-9145cf0985a8-679403] rules.RealizationSortRule:40 : 
CUBE[name=CHARGEBILL_V3] priority 1 cost 1447. 
2018-10-11 11:10:40,599 INFO  [Query 
472468f6-303d-4086-addf-9145cf0985a8-679403] routing.QueryRouter:51 : Applying 
rule: class org.apache.kylin.query.routing.rules.RealizationSortRule, 
realizations before: [CUBE[name=CHARGEBILL_V3]], realizations after: 
[CUBE[name=CHARGEBILL_V3]]
2018-10-11 11:10:40,599 INFO  [Query 
472468f6-303d-4086-addf-9145cf0985a8-679403] routing.QueryRouter:75 : The 
realizations remaining: [CUBE[name=CHARGEBILL_V3]],and the final chosen one for 
current olap context 0 is CUBE[name=CHARGEBILL_V3]
2018-10-11 11:10:41,311 INFO  [Scheduler 1381016263 Job 
cd97b823-4d7f-44cd-a156-6d88d63794d8-445] hive.CreateFlatHiveTableStep:38 : Map 
1: 0/42 Map 11: 0/2     Map 12: 0/1     Map 13: 0/1     Map 14: 0/1     Map 15: 
0/1     Map 16: 0/1     Map 17: 0/1    Map 18: 0/1      Map 19: 0/1     Map 20: 
0/1     Map 21: 0/1     Map 22: 0/1     Map 23: 0/1     Map 24: 0/1     Map 3: 
0/8      Map 4: 0/1      Map 5: 0/5      Map 7: 0/5      Map 9: 0/5      
Reducer 10: 0/2 Reducer 6: 0/2  Reducer 8: 0/2
2018-10-11 11:10:41,535 INFO  [Query 
472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:1068 : 
Processed rows for each storageContext: 0 
2018-10-11 11:10:41,536 INFO  [Query 
472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:510 : Stats 
of SQL response: isException: false, duration: 948, total scan count 0
2018-10-11 11:10:41,536 DEBUG [Query 
472468f6-303d-4086-addf-9145cf0985a8-679403] util.CheckUtil:35 : query cache is 
disabled
2018-10-11 11:10:41,536 INFO  [Query 
472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:328 : 
==========================[QUERY]===============================
Query Id: 472468f6-303d-4086-addf-9145cf0985a8
SQL: select * from V_KMSSTATION_V3 where name like '山东%'
User: ADMIN
Success: true
Duration: 0.948
Project: TELD
Realization Names: [CUBE[name=CHARGEBILL_V3]]
Cuboid Ids: []
Total scan count: 0
Total scan bytes: 0
Result row count: 1
Accept Partial: true
Is Partial Result: false
Hit Exception Cache: false
Storage cache used: false
Is Query Push-Down: false
Is Prepare: false
Trace URL: null
Message: null
==========================[QUERY]===============================



第2个sql:
   -执行sql:select name from V_KMSSTATION_V3 where name like '山东%'  --查询的结果是旧数据

--执行日志如下(奇怪的是这2个查询 日志上显示的cube不一样,
我们在定义时 ChargeBillActiveProtectAnalysisCube 和CHARGEBILL_V3 是公用的)
2018-10-11 11:13:48,547 DEBUG [Query 
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] util.CheckUtil:35 : query cache 
disabled in KylinConfig
2018-10-11 11:13:48,547 INFO  [Query 
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:428 : Using 
project: TELD
2018-10-11 11:13:48,547 INFO  [Query 
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:429 : The 
original query:  select name from V_KMSSTATION_V3 where name like '山东%'
2018-10-11 11:13:48,549 INFO  [Query 
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:646 : The 
corrected query: select name from V_KMSSTATION_V3 where name like '山东%'
LIMIT 50000
2018-10-11 11:13:48,556 INFO  [Query 
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] routing.QueryRouter:58 : Find 
candidates by table DEFAULT.V_KMSSTATION_V3 and project=TELD : 
CUBE[name=ChargeBillActiveProtectAnalysisCube]
2018-10-11 11:13:48,556 INFO  [Query 
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] routing.QueryRouter:51 : Applying 
rule: class 
org.apache.kylin.query.routing.rules.RemoveBlackoutRealizationsRule, 
realizations before: [CUBE[name=ChargeBillActiveProtectAnalysisCube]], 
realizations after: [CUBE[name=ChargeBillActiveProtectAnalysisCube]]
2018-10-11 11:13:48,557 INFO  [Query 
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] routing.QueryRouter:51 : Applying 
rule: class 
org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule, 
realizations before: [CUBE[name=ChargeBillActiveProtectAnalysisCube]], 
realizations after: [CUBE[name=ChargeBillActiveProtectAnalysisCube]]
2018-10-11 11:13:48,557 INFO  [Query 
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] rules.RealizationSortRule:40 : 
CUBE[name=ChargeBillActiveProtectAnalysisCube] priority 1 cost 422. 
2018-10-11 11:13:48,557 INFO  [Query 
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] routing.QueryRouter:51 : Applying 
rule: class org.apache.kylin.query.routing.rules.RealizationSortRule, 
realizations before: [CUBE[name=ChargeBillActiveProtectAnalysisCube]], 
realizations after: [CUBE[name=ChargeBillActiveProtectAnalysisCube]]
2018-10-11 11:13:48,557 INFO  [Query 
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] routing.QueryRouter:75 : The 
realizations remaining: [CUBE[name=ChargeBillActiveProtectAnalysisCube]],and 
the final chosen one for current olap context 0 is 
CUBE[name=ChargeBillActiveProtectAnalysisCube]
2018-10-11 11:13:49,288 INFO  [Scheduler 1381016263 Job 
0ad7c900-c8d9-4af5-bac0-58db06af4a67-378] hive.CreateFlatHiveTableStep:38 : Map 
1: 30(+41)/96   Map 11: 2/2     Map 12: 1/1     Map 13: 1/1     Map 14: 1/1     
Map 15: 1/1     Map 16: 1/1     Map 17: 1/1     Map 18: 1/1     Map 19: 1/1     
Map 20: 1/1     Map 21: 1/1     Map 22: 1/1     Map 23: 1/1     Map 24: 1/1     
Map 3: 72(+2)/74        Map 4: 1/1      Map 5: 5/5      Map 7: 5/5      Map 9: 
5/5      Reducer 10: 2/2 Reducer 6: 2/2  Reducer 8: 2/2
2018-10-11 11:13:49,557 INFO  [Scheduler 1381016263 Job 
0ad7c900-c8d9-4af5-bac0-58db06af4a67-378] hive.CreateFlatHiveTableStep:38 : Map 
1: 30(+42)/96   Map 11: 2/2     Map 12: 1/1     Map 13: 1/1     Map 14: 1/1     
Map 15: 1/1     Map 16: 1/1     Map 17: 1/1     Map 18: 1/1     Map 19: 1/1     
Map 20: 1/1     Map 21: 1/1     Map 22: 1/1     Map 23: 1/1     Map 24: 1/1     
Map 3: 72(+2)/74        Map 4: 1/1      Map 5: 5/5      Map 7: 5/5      Map 9: 
5/5      Reducer 10: 2/2 Reducer 6: 2/2  Reducer 8: 2/2
2018-10-11 11:13:49,597 INFO  [Query 
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:1068 : 
Processed rows for each storageContext: 0 
2018-10-11 11:13:49,597 INFO  [Query 
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:510 : Stats 
of SQL response: isException: false, duration: 1050, total scan count 0
2018-10-11 11:13:49,597 DEBUG [Query 
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] util.CheckUtil:35 : query cache is 
disabled
2018-10-11 11:13:49,598 INFO  [Query 
32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:328 : 
==========================[QUERY]===============================
Query Id: 32b23403-77d4-40d5-9743-ce2b7f31628a
SQL: select name from V_KMSSTATION_V3 where name like '山东%'
User: ADMIN
Success: true
Duration: 1.051
Project: TELD
Realization Names: [CUBE[name=ChargeBillActiveProtectAnalysisCube]]
Cuboid Ids: []
Total scan count: 0
Total scan bytes: 0
Result row count: 1
Accept Partial: true
Is Partial Result: false
Hit Exception Cache: false
Storage cache used: false
Is Query Push-Down: false
Is Prepare: false
Trace URL: null
Message: null
==========================[QUERY]===============================






------------------------------------------------------------------
发件人:Na Zhai <[email protected]>
发送时间:2018年10月11日(星期四) 17:38
收件人:[email protected] <[email protected]>; 潘博存 <[email protected]>
主 题:答复: 维度表数据发生变化后,sql查询出来的数据还是历史数据


Hi, bocun
What's your Kylin version? Can you provide the log information? 
发送自 Windows 10 版邮件应用
 
发件人: 潘博存 <[email protected]>
发送时间: Thursday, October 11, 2018 3:16:50 PM
收件人: user
抄送: 冯志山
主题: 维度表数据发生变化后,sql查询出来的数据还是历史数据
今天,线上环境,发现一个问题,感觉很奇怪,详请教一下具体原因,以及如何避免以下现象的出现,感谢!

1.基于事实表A和维度表B创建了Cube c,其中 维度表B的字段有 id,code,name. 事实表A通过 字段BID 与B中的id字段进行关联 
--B的name 字段是衍生列
 2.可以使用kylin正常构件,每天正常构件。
  维度表有这样一条记录
   id   code   name
   001  001     山东大厦
 3.一段时间之后,维度表中的上述数据发生了变化。

  id   code   name
  001  001    山东新闻大厦

  这个时候再次进行构件
 4.在kylin中进行查询(kylin 中会生成事实表和维度表)
  select * from B   --查询结果   
   id  code  name
   001  001  山东新闻大厦

   select name from B --查询结果
    name
    山东大厦
  问题:
   为什么通过 select name from B  出来的数据是维度发生变化之前的数据呢,但是通过select * from B 出来的数据 是新数据

Reply via email to