Shaofeng is right. Check out details -- https://issues.apache.org/jira/browse/KYLIN-1749
On Wed, Apr 5, 2017 at 9:45 AM, ShaoFeng Shi <shaofeng...@apache.org> wrote: > Hi bingli, > > I didn't try a agg group with only 1 dimension; please check whether > removing the three single dim group to see whether it can work. Anyway, > this is a bug I think. > > Regarding "precicely define combination with agg group", yes it is doable > with agg group; say if you only want to use the combination ABCD, you can > make them into a group, and then mark all these 4 as "mandatory", then for > this group, only 1 cuboid will be calculated (otherwise will be 16). While, > in older Kylin versions, this isn't allowed, so you need configure " > kylin.cube.aggrgroup.isMandatoryOnlyValid=true" in kylin.properties. > > 2017-04-05 9:24 GMT+08:00 bing...@iflytek.com <bing...@iflytek.com>: > >> 你好,李杨: >> 为什么kylin 最终解析的cuboids 与 我通过页面设计的不一致。这是不是 aggregation groups的一个BUG? >> >> 不一致,侧面验证是执行如下查询语句报错误,错误详见附件: >> select ts_hour, sum(request) >> from view_flow_insight >> group by ts_hour >> >> >> 你给的文章我早先也拜读过,另外《Apache Kylin 权威指南》一书中指出:“聚合组的设计非常灵活,甚至可以用来描 >> 述一些极端的设计。假设我们的业务需求非常单一,只需要 >> 某些特定的Cuboid,那么可以创建多个聚合组,每个聚合组代表一个Cuboid,.................... >> ..........”。根据以上资料,我设计了符合我业务需求的 Cube(由于展示层使用superset,无法 >> 使用多表,所以只能使用视图转成一张表),最终存在一些 cuboid无法查询。 >> >> ------------------------------ >> bing...@iflytek.com >> >> >> *From:* Li Yang <liy...@apache.org> >> *Date:* 2017-04-04 17:26 >> *To:* user <user@kylin.apache.org> >> *CC:* ShaoFeng?Shi <shaofeng...@apache.org> >> *Subject:* Re: Re: How Kylin Cuboid Scheduler Work With Aggregation >> Groups ? >> Google "Kylin aggregation group" and the first result is: >> http://kylin.apache.org/blog/2016/02/18/new-aggregation-group/ >> >> On Mon, Apr 3, 2017 at 12:03 PM, bing...@iflytek.com <bing...@iflytek.com >> > wrote: >> >>> 你好,少峰: >>> kylin cube中,无论是使用 aggregation >>> group还是其他cube优化策略,最终得到的都是一系列组合(如:<day_time, >>> gender>),而这些组合实际上是与 cuboid 唯一对应的。 >>> 在使用sql查询的时候,如果没有对应的 cuobid,那么查询是失败的(排除 extend、derived的维度组合)。 >>> >>> 下图是,apache kylin官网对 aggregation group的解析。按同样的规则,在上封邮件中定义的Cube, >>> 应该只会产生10种维度组合,即 >>> <day_time, gender> 576 >>> <day_time, age> 544 >>> <day_time, brand> 528 >>> <day_time, model> 520 >>> <day_time, resolution> 516 >>> <day_time, os_version> 514 >>> <day_time, ntt> 513 >>> <ts_minute> 256 >>> <ts_hour> 128 >>> <day_time> 512 >>> 对应的 cuboid 为后面的数字。从cube_statistics任务中看,最后只有 1023,516,576,513,528, >>> 514,544,520 这些组合(查询Hbase Meta表也是这种情况)。 >>> >>> 在 《Apache Kylin 权威指南》一书中,有介绍在一些极端情况下(如:precisely define the >>> cuboids/combinations) aggregation group 的使用方法。 >>> 所以,我以为目前 kylin 是支持这种定义方法的。 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ------------------------------ >>> bing...@iflyek.com >>> >>> >>> *From:* ShaoFeng Shi <shaofeng...@apache.org> >>> *Date:* 2017-04-02 22:17 >>> *To:* user <user@kylin.apache.org> >>> *Subject:* Re: How Kylin Cuboid Scheduler Work With Aggregation Groups ? >>> Hi Bing, >>> >>> An aggregation group is a dimension group, or say a sub-cube; it is NOT >>> a cuboid. >>> >>> I guess you want to precisely define the cuboids/combinations, that >>> isn't supported as in many cases user couldn't list all the combinations >>> they use. But you can describe them with the agg group / mandatory / joint >>> as close as possible. >>> >>> 2017-03-31 15:49 GMT+08:00 bing...@iflytek.com <bing...@iflytek.com>: >>> >>>> Hi,all >>>> I have a Cube, the desc is : >>>> >>>> { >>>> "uuid": "bcf11be2-83e4-497e-9e35-a402460a6446", >>>> "last_modified": 1490860973892, >>>> "version": "1.6.0", >>>> "name": "adx_flow_insight", >>>> "model_name": "adx_operator", >>>> "description": "", >>>> "null_string": null, >>>> "dimensions": [ >>>> { >>>> "name": "GENDER", >>>> "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT", >>>> "column": "GENDER", >>>> "derived": null >>>> }, >>>> { >>>> "name": "AGE", >>>> "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT", >>>> "column": "AGE", >>>> "derived": null >>>> }, >>>> { >>>> "name": "BRAND", >>>> "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT", >>>> "column": "BRAND", >>>> "derived": null >>>> }, >>>> { >>>> "name": "MODEL", >>>> "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT", >>>> "column": "MODEL", >>>> "derived": null >>>> }, >>>> { >>>> "name": "RESOLUTION", >>>> "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT", >>>> "column": "RESOLUTION", >>>> "derived": null >>>> }, >>>> { >>>> "name": "OS_VERSION", >>>> "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT", >>>> "column": "OS_VERSION", >>>> "derived": null >>>> }, >>>> { >>>> "name": "NTT", >>>> "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT", >>>> "column": "NTT", >>>> "derived": null >>>> }, >>>> { >>>> "name": "TS_MINUTE", >>>> "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT", >>>> "column": "TS_MINUTE", >>>> "derived": null >>>> }, >>>> { >>>> "name": "TS_HOUR", >>>> "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT", >>>> "column": "TS_HOUR", >>>> "derived": null >>>> }, >>>> { >>>> "name": "DAY_TIME", >>>> "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT", >>>> "column": "DAY_TIME", >>>> "derived": null >>>> } >>>> ], >>>> "measures": [ >>>> { >>>> "name": "_COUNT_", >>>> "function": { >>>> "expression": "COUNT", >>>> "parameter": { >>>> "type": "constant", >>>> "value": "1", >>>> "next_parameter": null >>>> }, >>>> "returntype": "bigint" >>>> }, >>>> "dependent_measure_ref": null >>>> }, >>>> { >>>> "name": "REQUEST_PV", >>>> "function": { >>>> "expression": "SUM", >>>> "parameter": { >>>> "type": "column", >>>> "value": "REQUEST", >>>> "next_parameter": null >>>> }, >>>> "returntype": "bigint" >>>> }, >>>> "dependent_measure_ref": null >>>> }, >>>> { >>>> "name": "IMPRESS_PV", >>>> "function": { >>>> "expression": "SUM", >>>> "parameter": { >>>> "type": "column", >>>> "value": "IMPRESS", >>>> "next_parameter": null >>>> }, >>>> "returntype": "bigint" >>>> }, >>>> "dependent_measure_ref": null >>>> }, >>>> { >>>> "name": "CLICK_PV", >>>> "function": { >>>> "expression": "SUM", >>>> "parameter": { >>>> "type": "column", >>>> "value": "CLICK", >>>> "next_parameter": null >>>> }, >>>> "returntype": "bigint" >>>> }, >>>> "dependent_measure_ref": null >>>> }, >>>> { >>>> "name": "FILL_PV", >>>> "function": { >>>> "expression": "SUM", >>>> "parameter": { >>>> "type": "column", >>>> "value": "FILL", >>>> "next_parameter": null >>>> }, >>>> "returntype": "bigint" >>>> }, >>>> "dependent_measure_ref": null >>>> }, >>>> { >>>> "name": "UV_DID", >>>> "function": { >>>> "expression": "COUNT_DISTINCT", >>>> "parameter": { >>>> "type": "column", >>>> "value": "DID", >>>> "next_parameter": null >>>> }, >>>> "returntype": "hllc(10)" >>>> }, >>>> "dependent_measure_ref": null >>>> } >>>> ], >>>> "dictionaries": [], >>>> "rowkey": { >>>> "rowkey_columns": [ >>>> { >>>> "column": "DAY_TIME", >>>> "encoding": "date", >>>> "isShardBy": false >>>> }, >>>> { >>>> "column": "TS_MINUTE", >>>> "encoding": "integer:4", >>>> "isShardBy": false >>>> }, >>>> { >>>> "column": "TS_HOUR", >>>> "encoding": "integer:4", >>>> "isShardBy": false >>>> }, >>>> { >>>> "column": "GENDER", >>>> "encoding": "dict", >>>> "isShardBy": false >>>> }, >>>> { >>>> "column": "AGE", >>>> "encoding": "dict", >>>> "isShardBy": false >>>> }, >>>> { >>>> "column": "BRAND", >>>> "encoding": "dict", >>>> "isShardBy": false >>>> }, >>>> { >>>> "column": "MODEL", >>>> "encoding": "dict", >>>> "isShardBy": false >>>> }, >>>> { >>>> "column": "RESOLUTION", >>>> "encoding": "dict", >>>> "isShardBy": false >>>> }, >>>> { >>>> "column": "OS_VERSION", >>>> "encoding": "dict", >>>> "isShardBy": false >>>> }, >>>> { >>>> "column": "NTT", >>>> "encoding": "dict", >>>> "isShardBy": false >>>> } >>>> ] >>>> }, >>>> "hbase_mapping": { >>>> "column_family": [ >>>> { >>>> "name": "F1", >>>> "columns": [ >>>> { >>>> "qualifier": "M", >>>> "measure_refs": [ >>>> "_COUNT_", >>>> "REQUEST_PV", >>>> "IMPRESS_PV", >>>> "CLICK_PV", >>>> "FILL_PV" >>>> ] >>>> } >>>> ] >>>> }, >>>> { >>>> "name": "F2", >>>> "columns": [ >>>> { >>>> "qualifier": "M", >>>> "measure_refs": [ >>>> "UV_DID" >>>> ] >>>> } >>>> ] >>>> } >>>> ] >>>> }, >>>> "aggregation_groups": [ >>>> { >>>> "includes": [ >>>> "DAY_TIME", >>>> "GENDER" >>>> ], >>>> "select_rule": { >>>> "hierarchy_dims": [], >>>> "mandatory_dims": [ >>>> "DAY_TIME" >>>> ], >>>> "joint_dims": [] >>>> } >>>> }, >>>> { >>>> "includes": [ >>>> "DAY_TIME", >>>> "AGE" >>>> ], >>>> "select_rule": { >>>> "hierarchy_dims": [], >>>> "mandatory_dims": [ >>>> "DAY_TIME" >>>> ], >>>> "joint_dims": [] >>>> } >>>> }, >>>> { >>>> "includes": [ >>>> "DAY_TIME", >>>> "BRAND" >>>> ], >>>> "select_rule": { >>>> "hierarchy_dims": [], >>>> "mandatory_dims": [ >>>> "DAY_TIME" >>>> ], >>>> "joint_dims": [] >>>> } >>>> }, >>>> { >>>> "includes": [ >>>> "DAY_TIME", >>>> "MODEL" >>>> ], >>>> "select_rule": { >>>> "hierarchy_dims": [], >>>> "mandatory_dims": [ >>>> "DAY_TIME" >>>> ], >>>> "joint_dims": [] >>>> } >>>> }, >>>> { >>>> "includes": [ >>>> "DAY_TIME", >>>> "RESOLUTION" >>>> ], >>>> "select_rule": { >>>> "hierarchy_dims": [], >>>> "mandatory_dims": [ >>>> "RESOLUTION" >>>> ], >>>> "joint_dims": [] >>>> } >>>> }, >>>> { >>>> "includes": [ >>>> "DAY_TIME", >>>> "OS_VERSION" >>>> ], >>>> "select_rule": { >>>> "hierarchy_dims": [], >>>> "mandatory_dims": [ >>>> "DAY_TIME" >>>> ], >>>> "joint_dims": [] >>>> } >>>> }, >>>> { >>>> "includes": [ >>>> "DAY_TIME", >>>> "NTT" >>>> ], >>>> "select_rule": { >>>> "hierarchy_dims": [], >>>> "mandatory_dims": [ >>>> "DAY_TIME" >>>> ], >>>> "joint_dims": [] >>>> } >>>> }, >>>> { >>>> "includes": [ >>>> "TS_MINUTE" >>>> ], >>>> "select_rule": { >>>> "hierarchy_dims": [], >>>> "mandatory_dims": [ >>>> "TS_MINUTE" >>>> ], >>>> "joint_dims": [] >>>> } >>>> }, >>>> { >>>> "includes": [ >>>> "TS_HOUR" >>>> ], >>>> "select_rule": { >>>> "hierarchy_dims": [], >>>> "mandatory_dims": [ >>>> "TS_HOUR" >>>> ], >>>> "joint_dims": [] >>>> } >>>> } >>>> ], >>>> "signature": "DSSmByHn2sATiETlBdjANQ==", >>>> "notify_list": [], >>>> "status_need_notify": [ >>>> "ERROR", >>>> "DISCARDED", >>>> "SUCCEED" >>>> ], >>>> "partition_date_start": 1488326400000, >>>> "partition_date_end": 3153600000000, >>>> "auto_merge_time_ranges": [ >>>> 604800000, >>>> 2419200000 >>>> ], >>>> "retention_range": 0, >>>> "engine_type": 2, >>>> "storage_type": 2, >>>> "override_kylin_properties": { >>>> "kylin.job.mr.config.override.mapreduce.job.queuename": "ad" >>>> } >>>> >>>> } >>>> >>>> There have 10 dims, and use aggregation groups. I want Cube only >>>> contains 10 combs: >>>> <day_time, gender> 576 >>>> <day_time, age> 544 >>>> <day_time, brand> 528 >>>> <day_time, model> 520 >>>> <day_time, resolution> 516 >>>> <day_time, os_version> 514 >>>> <day_time, ntt> 513 >>>> <ts_minute> 256 >>>> <ts_hour> 128 >>>> <day_time> 512 >>>> >>>> But the Cuboid Scheduler parse as follower: >>>> >>>> 2017-03-31 15:32:47,735 (main) [INFO - org..apache.kylin.cub >>>> e.CubeManager.loadAllCubeInstance(CubeManager.java:908)] Loa >>>> ded 4 cubes, fail on 0 cubes >>>> 1023 >>>> 516 >>>> 576 >>>> 513 >>>> 528 >>>> 514 >>>> 544 >>>> 520 >>>> 2017-03-31 15:32:47,742 (Thread-0) [INFO - org.apache.hadoop >>>> .hbase.client.ConnectionManager$HConnectionImplementation.cl >>>> oseMasterService(ConnectionManager.java:2259)] Closing maste >>>> r protocol: MasterService >>>> >>>> Question: >>>> How Aggregation Groups Work? I can not set single dim in >>>> aggregation? >>>> >>>> Thanks for you suggestion~~~ >>>> >>>> ------------------------------ >>>> bing...@iflytek..com <bing...@iflytek.com> >>>> >>> >>> >>> >>> -- >>> Best regards, >>> >>> Shaofeng Shi 史少锋 >>> >>> >> > > > -- > Best regards, > > Shaofeng Shi 史少锋 > >