[ https://issues.apache.org/jira/browse/KYLIN-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Billy(Yiming) Liu reassigned KYLIN-1576: ---------------------------------------- Assignee: Billy(Yiming) Liu (was: Yiming Liu) > Support of new join type in the Cube Model - Temporal Join > ---------------------------------------------------------- > > Key: KYLIN-1576 > URL: https://issues.apache.org/jira/browse/KYLIN-1576 > Project: Kylin > Issue Type: New Feature > Components: General > Affects Versions: Future > Reporter: Richard Calaba > Assignee: Billy(Yiming) Liu > Priority: Blocker > > There is a notion of time-dependent master data in many business scenarios. > Typically modeled with granularity 1 day (datefrom, dateto fields of type > DATE defining validity time of one master data record). Occasionally you can > think of lower granularity so use of TIMESTAMP can be also seen as an valid > scenario). Example of such master data definition could be: > Master Data / Dimension Table: > ========================= > KEY: PRODUCT_ID, DATE_TO, > NON-KEY: DATE_FROM, PRODUCT_DESCRIPTION > - assuming that PRODUCT_DESCRIPTION cannot have 2 values during one day it is > assumed that DATE_TO <= DATE_TO and also that there are no overlapping > intervals (DATE_FROM, DATE_TO) for all PRODUCT master data > - the KEY is then intentionally defined as (PRODUCT_ID, DATE_TO) so the > statment SELECT * from PRODUCT WHERE ID = 'prod_key_1' AND DATE_TO >= > today/now and DATE_FROM <= today/now is efficient way to retrieve 'current' > PRODUCT master data (description). The today/now is also being named as 'key > date'. > - now if I have transaction data (FACT table) of product sales, i.e: > SALES_DATE, PRODUCT_ID, STORE_ID, .... > I would like to show the Sold Products at Store at certain date and also show > the Description of the product at the date of product sale (assuming here > that there is product catalog which can be updated independently, but for > auditing purposes the original product description used during sale is needed > to be displayed/used). > The SQL for the temporal join would be then: > SELECT S.PRODUCT_ID, S.SALES_DATE, P.PRODUCT_DESCRIPTION > FROM SALES as S LEFT OUTER JOIN PRODUCT as P > ON S.PRODUCT_ID = P.PRODUCT_ID > AND S.SALES_DATE >= P.DATE_FROM AND > AND S.SALES_DATE <= P.DATE_TO > (also INNER TEMPORAL JOIN can be defined and be valid in some scenarios but > in this case it won't be the proper join - we need to show the product sales > even the description wasn't maintained in product master data) > (some more details for temporal joins - see i.e. here - > http://scn.sap.com/community/hana-in-memory/blog/2014/09/28/using-temporal-join-to-fetch-the-result-set-within-the-time-interval > ) > This scenario can be supported by Kylin if following enhancement would be > done: > 1) The Cube Model allowing to define special variant of LEFT OUTER and INNER > joins (suggesting name temporal (left outer/inner) join) which forces to > specify a 'key date' as a expression (column / constant / ...) from the FACT > table and 2 validity fields ('valid from' and 'valid to') fro the LOOKUP > table/ Those 2 validity fields are defining master data record validity > period. Supported types for those fields should be DATE, optionally TIMESTAMP > is also fine but rarely used in business scenarios. > Other option rather then defining new join type is to loosen the join > condition and allowing <= and >= operands to be used as part of the LOOKUP > join definition. > 2) The Cube Definition then needs to know the extension of the join type in > the cube model and needs to force the additional fields (key-date, > valid-from, valid-to) be part of the whole cube structure. Or alternatively > cube definition for derived dimensions can be extended to define a > "time-dependent derived lookup" similar way as described in the step 1) for > the suggested cube model join type extension. > 3) Very often the time-partition field of the cube which is being used for > incremental data loads to cubes will be the 'key-date'. BUT this shouldn't be > hard-coded this way as this is not true for every scenario. -- This message was sent by Atlassian JIRA (v6.3.4#6332)