[ https://issues.apache.org/jira/browse/KYLIN-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Roger Shi updated KYLIN-187: ---------------------------- Summary: Data Statistics Collection and Auto Modeling (was: Data Statistics Analyzer ) > Data Statistics Collection and Auto Modeling > --------------------------------------------- > > Key: KYLIN-187 > URL: https://issues.apache.org/jira/browse/KYLIN-187 > Project: Kylin > Issue Type: New Feature > Components: Tools, Build and Test > Reporter: Luke Han > Labels: github-import > Fix For: Backlog > > > 1 Overview > We need the statistics data for the following domains: > * Design cube metadata based on query log > * Design HBase row-key based on data distribution (e.g. histogram and > cardinality) > * Choose execution plan based on cuboid data > 2 Data Analyzer > We need to analyzer the hive data and cube data in 2 phases. Firstly, we will > analyze the hive to guide the 1st round design of row key. Then we will > analyze the cube data to refine the design of row key and to estimate the > cost of query. > 2.1 Analyze Hive Data > We need to analyze the following statistics data on hive table: > * Cardinality of each dimension > * Cardinality of dimension combination (optional) > * Value distribution of each dimension (optional) > Based on the statistics of hive data, we can design row key group from high > cardinality dimension to low cardinality dimension. BTW, we should evenly > split dimension into the row key group that will reduce the number of cuboid. > 2.2 Analyze Cube Data > We need to analyze the following statistics on data cube: > * Count of each cuboid > * Group ratio of each cuboid = current cuboid count / lower group base cuboid > count > 3 Query Analyzer > TBD > ---------------- Imported from GitHub ---------------- > Url: https://github.com/KylinOLAP/Kylin/issues/318 > Created by: [lukehan|https://github.com/lukehan] > Labels: newfeature, > Milestone: Backlog > Created at: Fri Dec 26 15:21:24 CST 2014 > State: open -- This message was sent by Atlassian JIRA (v6.3.15#6346)