[jira] [Created] (KYLIN-1786) Frontend work for KYLIN-1313
Dong Li created KYLIN-1786: -- Summary: Frontend work for KYLIN-1313 Key: KYLIN-1786 URL: https://issues.apache.org/jira/browse/KYLIN-1786 Project: Kylin Issue Type: Improvement Components: Web Reporter: Dong Li Assignee: Zhong,Jason Priority: Minor Attachments: 屏幕快照 2016-06-15 12.22.54.png KYLIN-1313 introduced a measure called extendedcolumn, but seems not enabled on WebUI, see attached screenshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1786) Frontend work for KYLIN-1313
[ https://issues.apache.org/jira/browse/KYLIN-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Li updated KYLIN-1786: --- Attachment: 屏幕快照 2016-06-15 12.22.54.png > Frontend work for KYLIN-1313 > > > Key: KYLIN-1786 > URL: https://issues.apache.org/jira/browse/KYLIN-1786 > Project: Kylin > Issue Type: Improvement > Components: Web >Reporter: Dong Li >Assignee: Zhong,Jason >Priority: Minor > Attachments: 屏幕快照 2016-06-15 12.22.54.png > > > KYLIN-1313 introduced a measure called extendedcolumn, but seems not enabled > on WebUI, see attached screenshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KYLIN-1576) Support of new join type in the Cube Model - Temporal Join
[ https://issues.apache.org/jira/browse/KYLIN-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330891#comment-15330891 ] Richard Calaba edited comment on KYLIN-1576 at 6/15/16 1:04 AM: One workaround - not generic - but solves the case with the temporal join logic which is composed from: - equality join on entity ID (id1, ... id-N) - 2 non-equality joins on entity validity (date_from, date_to) is to define new fact-table which includes the original fact table and time-dependent atributes. So the old join condition in the model: - FROM fact_table LEFT OUTER JOIN time_dependent_attrs ON fact_table.id1 = time_dependent_attrs.id1 (AND fact_table.id-N = time_dependent_attrs.id-N)* AND fact_table.transaction_date <= time_dependent_attrs.date_to AND fact_table.transaction_date >= time_dependent_attrs.date_from you can define new fact table this way to achieve same logic: create table/view fact_table_new AS SELECT fact_table.*, timedep.attr1, timedep.attr2, FROM fact_table AS fact LEFT OUTER JOIN time_dependent_attrs AS timedep ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.customer_id-N)* WHERE fact.transaction_date BETWEEN timedep.date_from AND timedep.date_to; The draw-back of this solution: You will make all time-dependent attributes (if needed to be used for grouping) as separate Normal dimensions - Kylin cannot utilize the optimized logic for Derived dimensions. So this solution is practical only for small amount of time-dependent attributes. If this JIRA ticket (shown as resilved) - https://issues.apache.org/jira/browse/KYLIN-1313 - would be true ... then you should be able to define those Time-Dep. attributes as part of Derrived Dimension ... but unfortunately I didn't find out how to do it on UI - in Kylin 1.5.2 The 2nd workaround is (instead of creating new fact table) to create new dimension table (lookup table) where you can map the records from t to the keys of the original fact-table: create table/view new_dim_table AS SELECT fact_table.id-1, (fact_table.id-N)*, timedep.* FROM fact_table AS fact LEFT OUTER JOIN time_dependent_attrs AS timedep ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.id-N)* WHERE fact.transaction_date BETWEEN timedep.date_from AND timedep.date_to; And then you can use Kylin Model to define: fact_table INNER JOIN new_dim_table(you do not have to use LEFT OUTER JOIN here anymore) ON fact_table.id1 = new_dim_table.id1 (AND fact_table.id-N = new_dim_table.id-N)* This way you will get Dim table of same size as Fact tabe -> but you can still utilize Derived dimensions benefits in Kylin. was (Author: cal...@gmail.com): One workaround - not generic - but solves the case with the temporal join logic which is composed from: - equality join on entity ID (id1, ... id-N) - 2 non-equality joins on entity validity (date_from, date_to) is to define new fact-table which includes the original fact table and time-dependent atributes. So the old join condition in the model: - FROM fact_table LEFT OUTER JOIN time_dependent_attrs ON fact_table.id1 = time_dependent_attrs.id1 (AND fact_table.id-N = time_dependent_attrs.id-N)* AND fact_table.transaction_date <= time_dependent_attrs.date_to AND fact_table.transaction_date >= time_dependent_attrs.date_from you can define new fact table this way to achieve same logic: create table/view fact_table_new AS SELECT fact_table.*, timedep.attr1, timedep.attr2, FROM fact_table AS fact LEFT OUTER JOIN time_dependent_attrs AS timedep ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.customer_id-N)* WHERE fact.transaction_date BETWEEN timedep.date_from AND timedep.date_to; The draw-back of this solution: You will make all time-dependent attributes (if needed to be used for grouping) as separate Normal dimensions - Kylin cannot utilize the optimized logic for Derived dimensions. So this solution is practical only for small amount of time-dependent attributes. The 2nd workaround is (instead of creating new fact table) to create new dimension table (lookup table) where you can map the records from t to the keys of the original fact-table: create table/view new_dim_table AS SELECT fact_table.id-1, (fact_table.id-N)*, timedep.* FROM fact_table AS fact LEFT OUTER JOIN time_dependent_attrs AS timedep ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.id-N)* WHERE fact.transaction_date BETWEEN timedep.date_from AND timedep.date_to; And then you can
[jira] [Commented] (KYLIN-1313) Enable deriving dimensions on non PK/FK
[ https://issues.apache.org/jira/browse/KYLIN-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330963#comment-15330963 ] Richard Calaba commented on KYLIN-1313: --- Similiar question - today playing 2ith Kylin-1.5.2 - I didn't see anywhere the ability (on UI) to specify that Dimension can be Derrived from Fact table ... where is it ??? > Enable deriving dimensions on non PK/FK > --- > > Key: KYLIN-1313 > URL: https://issues.apache.org/jira/browse/KYLIN-1313 > Project: Kylin > Issue Type: Improvement >Reporter: hongbin ma >Assignee: hongbin ma > Fix For: v1.5.2 > > > currently derived column has to be columns on look table, and the derived > host column has to be PK/FK(It's also a problem when the lookup table grows > every large). Sometimes columns on the fact exhibit deriving relationship > too. Here's an example fact table: > (dt date, seller_id bigint, seller_name varchar(100) , item_id bigint, > item_url varchar(1000), count decimal, price decimal) > seller_name is uniquely determined by each seller id, and item_url is > uniquely determined by each item_id. The users does not expect to do > filtering on columns like seller name or item_url, they just want to retrieve > it when they do grouping/filtering on other dimensions like selller id, item > id or even other dimensions like dt. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1576) Support of new join type in the Cube Model - Temporal Join
[ https://issues.apache.org/jira/browse/KYLIN-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330956#comment-15330956 ] Richard Calaba commented on KYLIN-1576: --- Interestingly - I just found out that Apache Drill supports this scenario - at least 2 yars ago they pacthed it to support this if at least one equi join condition is used: https://issues.apache.org/jira/browse/DRILL-485 So questions: 1) Why Hive cannot implement same - seems the argument in the docu doesn't hold anymore ... 2) Should / Can Kyling Cube Build use Apache Drill while builiding the data cubes ??? > Support of new join type in the Cube Model - Temporal Join > -- > > Key: KYLIN-1576 > URL: https://issues.apache.org/jira/browse/KYLIN-1576 > Project: Kylin > Issue Type: New Feature > Components: General >Affects Versions: Future >Reporter: Richard Calaba >Priority: Blocker > > There is a notion of time-dependent master data in many business scenarios. > Typically modeled with granularity 1 day (datefrom, dateto fields of type > DATE defining validity time of one master data record). Occasionally you can > think of lower granularity so use of TIMESTAMP can be also seen as an valid > scenario). Example of such master data definition could be: > Master Data / Dimension Table: > = > KEY: PRODUCT_ID, DATE_TO, > NON-KEY: DATE_FROM, PRODUCT_DESCRIPTION > - assuming that PRODUCT_DESCRIPTION cannot have 2 values during one day it is > assumed that DATE_TO <= DATE_TO and also that there are no overlapping > intervals (DATE_FROM, DATE_TO) for all PRODUCT master data > - the KEY is then intentionally defined as (PRODUCT_ID, DATE_TO) so the > statment SELECT * from PRODUCT WHERE ID = 'prod_key_1' AND DATE_TO >= > today/now and DATE_FROM <= today/now is efficient way to retrieve 'current' > PRODUCT master data (description). The today/now is also being named as 'key > date'. > - now if I have transaction data (FACT table) of product sales, i.e: > SALES_DATE, PRODUCT_ID, STORE_ID, > I would like to show the Sold Products at Store at certain date and also show > the Description of the product at the date of product sale (assuming here > that there is product catalog which can be updated independently, but for > auditing purposes the original product description used during sale is needed > to be displayed/used). > The SQL for the temporal join would be then: > SELECT S.PRODUCT_ID, S.SALES_DATE, P.PRODUCT_DESCRIPTION > FROM SALES as S LEFT OUTER JOIN PRODUCT as P > ON S.PRODUCT_ID = P.PRODUCT_ID > AND S.SALES_DATE >= P.DATE_FROM AND > AND S.SALES_DATE <= P.DATE_TO > (also INNER TEMPORAL JOIN can be defined and be valid in some scenarios but > in this case it won't be the proper join - we need to show the product sales > even the description wasn't maintained in product master data) > (some more details for temporal joins - see i.e. here - > http://scn.sap.com/community/hana-in-memory/blog/2014/09/28/using-temporal-join-to-fetch-the-result-set-within-the-time-interval > ) > This scenario can be supported by Kylin if following enhancement would be > done: > 1) The Cube Model allowing to define special variant of LEFT OUTER and INNER > joins (suggesting name temporal (left outer/inner) join) which forces to > specify a 'key date' as a expression (column / constant / ...) from the FACT > table and 2 validity fields ('valid from' and 'valid to') fro the LOOKUP > table/ Those 2 validity fields are defining master data record validity > period. Supported types for those fields should be DATE, optionally TIMESTAMP > is also fine but rarely used in business scenarios. > Other option rather then defining new join type is to loosen the join > condition and allowing <= and >= operands to be used as part of the LOOKUP > join definition. > 2) The Cube Definition then needs to know the extension of the join type in > the cube model and needs to force the additional fields (key-date, > valid-from, valid-to) be part of the whole cube structure. Or alternatively > cube definition for derived dimensions can be extended to define a > "time-dependent derived lookup" similar way as described in the step 1) for > the suggested cube model join type extension. > 3) Very often the time-partition field of the cube which is being used for > incremental data loads to cubes will be the 'key-date'. BUT this shouldn't be > hard-coded this way as this is not true for every scenario. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KYLIN-1576) Support of new join type in the Cube Model - Temporal Join
[ https://issues.apache.org/jira/browse/KYLIN-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330891#comment-15330891 ] Richard Calaba edited comment on KYLIN-1576 at 6/15/16 12:11 AM: - One workaround - not generic - but solves the case with the temporal join logic which is composed from: - equality join on entity ID (id1, ... id-N) - 2 non-equality joins on entity validity (date_from, date_to) is to define new fact-table which includes the original fact table and time-dependent atributes. So the old join condition in the model: - FROM fact_table LEFT OUTER JOIN time_dependent_attrs ON fact_table.id1 = time_dependent_attrs.id1 (AND fact_table.id-N = time_dependent_attrs.id-N)* AND fact_table.transaction_date <= time_dependent_attrs.date_to AND fact_table.transaction_date >= time_dependent_attrs.date_from you can define new fact table this way to achieve same logic: create table/view fact_table_new AS SELECT fact_table.*, timedep.attr1, timedep.attr2, FROM fact_table AS fact LEFT OUTER JOIN time_dependent_attrs AS timedep ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.customer_id-N)* WHERE fact.transaction_date BETWEEN timedep.date_from AND timedep.date_to; The draw-back of this solution: You will make all time-dependent attributes (if needed to be used for grouping) as separate Normal dimensions - Kylin cannot utilize the optimized logic for Derived dimensions. So this solution is practical only for small amount of time-dependent attributes. The 2nd workaround is (instead of creating new fact table) to create new dimension table (lookup table) where you can map the records from t to the keys of the original fact-table: create table/view new_dim_table AS SELECT fact_table.id-1, (fact_table.id-N)*, timedep.* FROM fact_table AS fact LEFT OUTER JOIN time_dependent_attrs AS timedep ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.id-N)* WHERE fact.transaction_date BETWEEN timedep.date_from AND timedep.date_to; And then you can use Kylin Model to define: fact_table INNER JOIN new_dim_table(you do not have to use LEFT OUTER JOIN here anymore) ON fact_table.id1 = new_dim_table.id1 (AND fact_table.id-N = new_dim_table.id-N)* This way you will get Dim table of same size as Fact tabe -> but you can still utilize Derived dimensions benefits in Kylin. was (Author: cal...@gmail.com): One workaround - not generic - for this temporal join which is composed from: - equality join on entity ID (id1, ... id-N) - 2 non-equality joins on entity validity (date_from, date_to) is to define new fact-table which includes the original fact table and time-dependent atributes. So the old join condition in the model: - FROM fact_table LEFT OUTER JOIN time_dependent_attrs ON fact_table.id1 = time_dependent_attrs.id1 (AND fact_table.id-N = time_dependent_attrs.id-N)* AND fact_table.transaction_date <= time_dependent_attrs.date_to AND fact_table.transaction_date >= time_dependent_attrs.date_from you can define new fact table this way to achieve same logic: create table/view fact_table_new AS SELECT fact_table.*, timedep.attr1, timedep.attr2, FROM fact_table AS fact LEFT OUTER JOIN time_dependent_attrs AS timedep ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.customer_id-N)* WHERE fact.transaction_date BETWEEN timedep.date_from AND timedep.date_to; The draw-back of this solution: You will make all time-dependent attributes (if needed to be used for grouping) as separate Normal dimensions - Kylin cannot utilize the optimized logic for Derrrived dimensions. So this solution is practical only for small amount of time-dependent attributes. The 2nd workaround is (instead of creating new fact table) is to create new dimension table (lookup table) where you can map the records from lookup to the keys of the original fact-table: create table/view new_dim_table AS SELECT fact_table.id-1, (fact_table.id-N)*, timedep.* FROM fact_table AS fact LEFT OUTER JOIN time_dependent_attrs AS timedep ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.id-N)* WHERE fact.transaction_date BETWEEN timedep.date_from AND timedep.date_to; And then you can use Kylin Model to define: fact_table INNER JOIN new_dim_table(you do not have to use LEFT OUTER JOIN here anymore) ON fact_table.id1 = new_dim_table.id1 (AND fact_table.id-N = new_dim_table.id-N)* This way you will get Dim table of same size as Fact table -> but you can
[jira] [Updated] (KYLIN-1576) Support of new join type in the Cube Model - Temporal Join
[ https://issues.apache.org/jira/browse/KYLIN-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Calaba updated KYLIN-1576: -- Description: There is a notion of time-dependent master data in many business scenarios. Typically modeled with granularity 1 day (datefrom, dateto fields of type DATE defining validity time of one master data record). Occasionally you can think of lower granularity so use of TIMESTAMP can be also seen as an valid scenario). Example of such master data definition could be: Master Data / Dimension Table: = KEY: PRODUCT_ID, DATE_TO, NON-KEY: DATE_FROM, PRODUCT_DESCRIPTION - assuming that PRODUCT_DESCRIPTION cannot have 2 values during one day it is assumed that DATE_TO <= DATE_TO and also that there are no overlapping intervals (DATE_FROM, DATE_TO) for all PRODUCT master data - the KEY is then intentionally defined as (PRODUCT_ID, DATE_TO) so the statment SELECT * from PRODUCT WHERE ID = 'prod_key_1' AND DATE_TO >= today/now and DATE_FROM <= today/now is efficient way to retrieve 'current' PRODUCT master data (description). The today/now is also being named as 'key date'. - now if I have transaction data (FACT table) of product sales, i.e: SALES_DATE, PRODUCT_ID, STORE_ID, I would like to show the Sold Products at Store at certain date and also show the Description of the product at the date of product sale (assuming here that there is product catalog which can be updated independently, but for auditing purposes the original product description used during sale is needed to be displayed/used). The SQL for the temporal join would be then: SELECT S.PRODUCT_ID, S.SALES_DATE, P.PRODUCT_DESCRIPTION FROM SALES as S LEFT OUTER JOIN PRODUCT as P ON S.PRODUCT_ID = P.PRODUCT_ID AND S.SALES_DATE >= P.DATE_FROM AND AND S.SALES_DATE <= P.DATE_TO (also INNER TEMPORAL JOIN can be defined and be valid in some scenarios but in this case it won't be the proper join - we need to show the product sales even the description wasn't maintained in product master data) (some more details for temporal joins - see i.e. here - http://scn.sap.com/community/hana-in-memory/blog/2014/09/28/using-temporal-join-to-fetch-the-result-set-within-the-time-interval ) This scenario can be supported by Kylin if following enhancement would be done: 1) The Cube Model allowing to define special variant of LEFT OUTER and INNER joins (suggesting name temporal (left outer/inner) join) which forces to specify a 'key date' as a expression (column / constant / ...) from the FACT table and 2 validity fields ('valid from' and 'valid to') fro the LOOKUP table/ Those 2 validity fields are defining master data record validity period. Supported types for those fields should be DATE, optionally TIMESTAMP is also fine but rarely used in business scenarios. Other option rather then defining new join type is to loosen the join condition and allowing <= and >= operands to be used as part of the LOOKUP join definition. 2) The Cube Definition then needs to know the extension of the join type in the cube model and needs to force the additional fields (key-date, valid-from, valid-to) be part of the whole cube structure. Or alternatively cube definition for derived dimensions can be extended to define a "time-dependent derived lookup" similar way as described in the step 1) for the suggested cube model join type extension. 3) Very often the time-partition field of the cube which is being used for incremental data loads to cubes will be the 'key-date'. BUT this shouldn't be hard-coded this way as this is not true for every scenario. was: One workaround - not generic - for this temporal join which is composed from: - equality join on entity ID (id1, ... id-N) - 2 non-equality joins on entity validity (date_from, date_to) is to define new fact-table which includes the original fact table and time-dependent attributes. So the old join condition in the model: - FROM fact_table LEFT OUTER JOIN time_dependent_attrs ON fact_table.id1 = time_dependent_attrs.id1 (AND fact_table.id-N = time_dependent_attrs.id-N)* AND fact_table.transaction_date <= time_dependent_attrs.date_to AND fact_table.transaction_date >= time_dependent_attrs.date_from you can define new fact table this way to achieve same logic: create table/view fact_table_new AS SELECT fact_table.*, timedep.attr1, timedep.attr2, FROM fact_table AS fact LEFT OUTER JOIN time_dependent_attrs AS timedep ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.customer_id-N)* WHERE fact.transaction_date BETWEEN timedep.date_from AND timedep.date_to; The draw-back of this solution: You will make all time-dependent attributes (if needed to be used for grouping) as separate Normal
[jira] [Comment Edited] (KYLIN-1576) Support of new join type in the Cube Model - Temporal Join
[ https://issues.apache.org/jira/browse/KYLIN-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330891#comment-15330891 ] Richard Calaba edited comment on KYLIN-1576 at 6/15/16 12:02 AM: - One workaround - not generic - for this temporal join which is composed from: - equality join on entity ID (id1, ... id-N) - 2 non-equality joins on entity validity (date_from, date_to) is to define new fact-table which includes the original fact table and time-dependent atributes. So the old join condition in the model: - FROM fact_table LEFT OUTER JOIN time_dependent_attrs ON fact_table.id1 = time_dependent_attrs.id1 (AND fact_table.id-N = time_dependent_attrs.id-N)* AND fact_table.transaction_date <= time_dependent_attrs.date_to AND fact_table.transaction_date >= time_dependent_attrs.date_from you can define new fact table this way to achieve same logic: create table/view fact_table_new AS SELECT fact_table.*, timedep.attr1, timedep.attr2, FROM fact_table AS fact LEFT OUTER JOIN time_dependent_attrs AS timedep ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.customer_id-N)* WHERE fact.transaction_date BETWEEN timedep.date_from AND timedep.date_to; The draw-back of this solution: You will make all time-dependent attributes (if needed to be used for grouping) as separate Normal dimensions - Kylin cannot utilize the optimized logic for Derrrived dimensions. So this solution is practical only for small amount of time-dependent attributes. The 2nd workaround is (instead of creating new fact table) is to create new dimension table (lookup table) where you can map the records from lookup to the keys of the original fact-table: create table/view new_dim_table AS SELECT fact_table.id-1, (fact_table.id-N)*, timedep.* FROM fact_table AS fact LEFT OUTER JOIN time_dependent_attrs AS timedep ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.id-N)* WHERE fact.transaction_date BETWEEN timedep.date_from AND timedep.date_to; And then you can use Kylin Model to define: fact_table INNER JOIN new_dim_table(you do not have to use LEFT OUTER JOIN here anymore) ON fact_table.id1 = new_dim_table.id1 (AND fact_table.id-N = new_dim_table.id-N)* This way you will get Dim table of same size as Fact table -> but you can still utilize Derrived dimensions benefits in Kylin. was (Author: cal...@gmail.com): One workaround - not generic - for this temporal join which is composed from: - equality join on entity ID (id1, ... id-N) - 2 non-equality joins on entity validity (date_from, date_to) is to define new fact-table which includes the original fact table and time-dependent atributes. So the old join condition in the model: - FROM fact_table LEFT OUTER JOIN time_dependent_attrs ON fact_table.id1 = time_dependent_attrs.id1 (AND fact_table.id-N = time_dependent_attrs.id-N)* AND fact_table.transaction_date <= time_dependent_attrs.date_to AND fact_table.transaction_date >= time_dependent_attrs.date_from you can define new fact table this way to achieve same logic: create table/view fact_table_new AS SELECT fact_table.*, timedep.attr1, timedep.attr2, FROM fact_table AS fact LEFT OUTER JOIN time_dependent_attrs AS timedep ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.customer_id-N)* WHERE fact.transaction_date BETWEEN timedep.date_from AND timedep.date_to; The draw-back of this solution: You will make all time-dependent attributes (if needed to be used for grouping) as separate Normal dimensions - Kylin cannot utilize the optimized logic for Derrrived dimensions. So this solution is practical only for small amount of time-dependent attributes. The 2nd workaround is (instead of creating new fact table) to create new dimension table (lookup table) where you can map the records from t to the keys of the original fact-table: create table/view new_dim_table AS SELECT fact_table.id-1, (fact_table.id-N)*, timedep.* FROM fact_table AS fact LEFT OUTER JOIN time_dependent_attrs AS timedep ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.id-N)* WHERE fact.transaction_date BETWEEN timedep.date_from AND timedep.date_to; And then you can use Kylin Modem to define: fact_table INNER JOIN new_dim_table(you do not have to use LEFT OUTER JOIN here anymore) ON fact_table.id1 = new_dim_table.id1 (AND fact_table.id-N = new_dim_table.id-N)* This way you will get Dim table of same size as Fact tabe -> but you can still utilize Derrived
[jira] [Commented] (KYLIN-1576) Support of new join type in the Cube Model - Temporal Join
[ https://issues.apache.org/jira/browse/KYLIN-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330891#comment-15330891 ] Richard Calaba commented on KYLIN-1576: --- One workaround - not generic - for this temporal join which is composed from: - equality join on entity ID (id1, ... id-N) - 2 non-equality joins on entity validity (date_from, date_to) is to define new fact-table which includes the original fact table and time-dependent atributes. So the old join condition in the model: - FROM fact_table LEFT OUTER JOIN time_dependent_attrs ON fact_table.id1 = time_dependent_attrs.id1 (AND fact_table.id-N = time_dependent_attrs.id-N)* AND fact_table.transaction_date <= time_dependent_attrs.date_to AND fact_table.transaction_date >= time_dependent_attrs.date_from you can define new fact table this way to achieve same logic: create table/view fact_table_new AS SELECT fact_table.*, timedep.attr1, timedep.attr2, FROM fact_table AS fact LEFT OUTER JOIN time_dependent_attrs AS timedep ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.customer_id-N)* WHERE fact.transaction_date BETWEEN timedep.date_from AND timedep.date_to; The draw-back of this solution: You will make all time-dependent attributes (if needed to be used for grouping) as separate Normal dimensions - Kylin cannot utilize the optimized logic for Derrrived dimensions. So this solution is practical only for small amount of time-dependent attributes. The 2nd workaround is (instead of creating new fact table) to create new dimension table (lookup table) where you can map the records from t to the keys of the original fact-table: create table/view new_dim_table AS SELECT fact_table.id-1, (fact_table.id-N)*, timedep.* FROM fact_table AS fact LEFT OUTER JOIN time_dependent_attrs AS timedep ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.id-N)* WHERE fact.transaction_date BETWEEN timedep.date_from AND timedep.date_to; And then you can use Kylin Modem to define: fact_table INNER JOIN new_dim_table(you do not have to use LEFT OUTER JOIN here anymore) ON fact_table.id1 = new_dim_table.id1 (AND fact_table.id-N = new_dim_table.id-N)* This way you will get Dim table of same size as Fact tabe -> but you can still utilize Derrived dimensions benefits in Kylin. > Support of new join type in the Cube Model - Temporal Join > -- > > Key: KYLIN-1576 > URL: https://issues.apache.org/jira/browse/KYLIN-1576 > Project: Kylin > Issue Type: New Feature > Components: General >Affects Versions: Future >Reporter: Richard Calaba >Priority: Blocker > > There is a notion of time-dependent master data in many business scenarios. > Typically modeled with granularity 1 day (datefrom, dateto fields of type > DATE defining validity time of one master data record). Occasionally you can > think of lower granularity so use of TIMESTAMP can be also seen as an valid > scenario). Example of such master data definition could be: > Master Data / Dimension Table: > = > KEY: PRODUCT_ID, DATE_TO, > NON-KEY: DATE_FROM, PRODUCT_DESCRIPTION > - assuming that PRODUCT_DESCRIPTION cannot have 2 values during one day it is > assumed that DATE_TO <= DATE_TO and also that there are no overlapping > intervals (DATE_FROM, DATE_TO) for all PRODUCT master data > - the KEY is then intentionally defined as (PRODUCT_ID, DATE_TO) so the > statment SELECT * from PRODUCT WHERE ID = 'prod_key_1' AND DATE_TO >= > today/now and DATE_FROM <= today/now is efficient way to retrieve 'current' > PRODUCT master data (description). The today/now is also being named as 'key > date'. > - now if I have transaction data (FACT table) of product sales, i.e: > SALES_DATE, PRODUCT_ID, STORE_ID, > I would like to show the Sold Products at Store at certain date and also show > the Description of the product at the date of product sale (assuming here > that there is product catalog which can be updated independently, but for > auditing purposes the original product description used during sale is needed > to be displayed/used). > The SQL for the temporal join would be then: > SELECT S.PRODUCT_ID, S.SALES_DATE, P.PRODUCT_DESCRIPTION > FROM SALES as S LEFT OUTER JOIN PRODUCT as P > ON S.PRODUCT_ID = P.PRODUCT_ID > AND S.SALES_DATE >= P.DATE_FROM AND > AND S.SALES_DATE <= P.DATE_TO > (also INNER TEMPORAL JOIN can be defined and be valid in some scenarios but > in this case it won't be the proper join - we need to show the product sales > even the description wasn't maintained in product master data) >
[jira] [Commented] (KYLIN-1576) Support of new join type in the Cube Model - Temporal Join
[ https://issues.apache.org/jira/browse/KYLIN-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330573#comment-15330573 ] Richard Calaba commented on KYLIN-1576: --- Houston, we have a problem - according to https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins "Only equality joins, outer joins, and left semi joins are supported in Hive. Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job. Also, more than two tables can be joined in Hive." Any ideas ... ? > Support of new join type in the Cube Model - Temporal Join > -- > > Key: KYLIN-1576 > URL: https://issues.apache.org/jira/browse/KYLIN-1576 > Project: Kylin > Issue Type: New Feature > Components: General >Affects Versions: Future >Reporter: Richard Calaba >Priority: Blocker > > There is a notion of time-dependent master data in many business scenarios. > Typically modeled with granularity 1 day (datefrom, dateto fields of type > DATE defining validity time of one master data record). Occasionally you can > think of lower granularity so use of TIMESTAMP can be also seen as an valid > scenario). Example of such master data definition could be: > Master Data / Dimension Table: > = > KEY: PRODUCT_ID, DATE_TO, > NON-KEY: DATE_FROM, PRODUCT_DESCRIPTION > - assuming that PRODUCT_DESCRIPTION cannot have 2 values during one day it is > assumed that DATE_TO <= DATE_TO and also that there are no overlapping > intervals (DATE_FROM, DATE_TO) for all PRODUCT master data > - the KEY is then intentionally defined as (PRODUCT_ID, DATE_TO) so the > statment SELECT * from PRODUCT WHERE ID = 'prod_key_1' AND DATE_TO >= > today/now and DATE_FROM <= today/now is efficient way to retrieve 'current' > PRODUCT master data (description). The today/now is also being named as 'key > date'. > - now if I have transaction data (FACT table) of product sales, i.e: > SALES_DATE, PRODUCT_ID, STORE_ID, > I would like to show the Sold Products at Store at certain date and also show > the Description of the product at the date of product sale (assuming here > that there is product catalog which can be updated independently, but for > auditing purposes the original product description used during sale is needed > to be displayed/used). > The SQL for the temporal join would be then: > SELECT S.PRODUCT_ID, S.SALES_DATE, P.PRODUCT_DESCRIPTION > FROM SALES as S LEFT OUTER JOIN PRODUCT as P > ON S.PRODUCT_ID = P.PRODUCT_ID > AND S.SALES_DATE >= P.DATE_FROM AND > AND S.SALES_DATE <= P.DATE_TO > (also INNER TEMPORAL JOIN can be defined and be valid in some scenarios but > in this case it won't be the proper join - we need to show the product sales > even the description wasn't maintained in product master data) > (some more details for temporal joins - see i.e. here - > http://scn.sap.com/community/hana-in-memory/blog/2014/09/28/using-temporal-join-to-fetch-the-result-set-within-the-time-interval > ) > This scenario can be supported by Kylin if following enhancement would be > done: > 1) The Cube Model allowing to define special variant of LEFT OUTER and INNER > joins (suggesting name temporal (left outer/inner) join) which forces to > specify a 'key date' as a expression (column / constant / ...) from the FACT > table and 2 validity fields ('valid from' and 'valid to') fro the LOOKUP > table/ Those 2 validity fields are defining master data record validity > period. Supported types for those fields should be DATE, optionally TIMESTAMP > is also fine but rarely used in business scenarios. > Other option rather then defining new join type is to loosen the join > condition and allowing <= and >= operands to be used as part of the LOOKUP > join definition. > 2) The Cube Definition then needs to know the extension of the join type in > the cube model and needs to force the additional fields (key-date, > valid-from, valid-to) be part of the whole cube structure. Or alternatively > cube definition for derived dimensions can be extended to define a > "time-dependent derived lookup" similar way as described in the step 1) for > the suggested cube model join type extension. > 3) Very often the time-partition field of the cube which is being used for > incremental data loads to cubes will be the 'key-date'. BUT this shouldn't be > hard-coded this way as this is not true for every scenario. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1576) Support of new join type in the Cube Model - Temporal Join
[ https://issues.apache.org/jira/browse/KYLIN-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330539#comment-15330539 ] Richard Calaba commented on KYLIN-1576: --- Did quick investigation - the problem is that the = condition is assumed by Model/UI and the code which generates the FROM clause so to correctly pacth this: a) Model JSON needs to be enhanced to allow different join operands than = . I would suggest add optional model JSON node - join-operand which if missing/empty defaults to '=' otherwise can contain '<', '>', '<=', '>=' for now ... b) The UI needs to be enriched to allow to override the default = join operand - I believe the main coding is in: kylin-master/webapp/app/partials/modelDesigner/data_model.html and kylin-master\webapp\app\js\controllers\cubeModel.js c) The Java Coding responsible for generation of the FROM clause needs to be also enhanced: - code in FROM clause generation logic: kylin-master\core-job\src\main\java\org\apache\kylin\job\JoinedFlatTable.java - defaults to use '=' as join operator - code in class kylin-master\core-metadata\src\main\java\org\apache\kylin\metadata\model\JoinDesc.java – to support different conditions than = public class JoinDesc { // inner, left, right, outer... @JsonProperty("type") private String type; @JsonProperty("primary_key") private String[] primaryKey; @JsonProperty("foreign_key") private String[] foreignKey; private TblColRef[] primaryKeyColumns; private TblColRef[] foreignKeyColumns; So we might need to add one more Json property private String[] join_operator - if missing / empty - defaults to = otherwise the array length needs to be same as primaryKey and foreignKey length ... d) Then check if Build & Query works fine > Support of new join type in the Cube Model - Temporal Join > -- > > Key: KYLIN-1576 > URL: https://issues.apache.org/jira/browse/KYLIN-1576 > Project: Kylin > Issue Type: New Feature > Components: General >Affects Versions: Future >Reporter: Richard Calaba >Priority: Blocker > > There is a notion of time-dependent master data in many business scenarios. > Typically modeled with granularity 1 day (datefrom, dateto fields of type > DATE defining validity time of one master data record). Occasionally you can > think of lower granularity so use of TIMESTAMP can be also seen as an valid > scenario). Example of such master data definition could be: > Master Data / Dimension Table: > = > KEY: PRODUCT_ID, DATE_TO, > NON-KEY: DATE_FROM, PRODUCT_DESCRIPTION > - assuming that PRODUCT_DESCRIPTION cannot have 2 values during one day it is > assumed that DATE_TO <= DATE_TO and also that there are no overlapping > intervals (DATE_FROM, DATE_TO) for all PRODUCT master data > - the KEY is then intentionally defined as (PRODUCT_ID, DATE_TO) so the > statment SELECT * from PRODUCT WHERE ID = 'prod_key_1' AND DATE_TO >= > today/now and DATE_FROM <= today/now is efficient way to retrieve 'current' > PRODUCT master data (description). The today/now is also being named as 'key > date'. > - now if I have transaction data (FACT table) of product sales, i.e: > SALES_DATE, PRODUCT_ID, STORE_ID, > I would like to show the Sold Products at Store at certain date and also show > the Description of the product at the date of product sale (assuming here > that there is product catalog which can be updated independently, but for > auditing purposes the original product description used during sale is needed > to be displayed/used). > The SQL for the temporal join would be then: > SELECT S.PRODUCT_ID, S.SALES_DATE, P.PRODUCT_DESCRIPTION > FROM SALES as S LEFT OUTER JOIN PRODUCT as P > ON S.PRODUCT_ID = P.PRODUCT_ID > AND S.SALES_DATE >= P.DATE_FROM AND > AND S.SALES_DATE <= P.DATE_TO > (also INNER TEMPORAL JOIN can be defined and be valid in some scenarios but > in this case it won't be the proper join - we need to show the product sales > even the description wasn't maintained in product master data) > (some more details for temporal joins - see i.e. here - > http://scn.sap.com/community/hana-in-memory/blog/2014/09/28/using-temporal-join-to-fetch-the-result-set-within-the-time-interval > ) > This scenario can be supported by Kylin if following enhancement would be > done: > 1) The Cube Model allowing to define special variant of LEFT OUTER and INNER > joins (suggesting name temporal (left outer/inner) join) which forces to > specify a 'key date' as a expression (column / constant / ...) from the FACT > table and 2 validity fields ('valid from' and 'valid to') fro the LOOKUP > table/ Those 2 validity fields are defining master data record validity > period. Supported types for those fields should be DATE,
[jira] [Commented] (KYLIN-976) Support Custom Aggregation Types
[ https://issues.apache.org/jira/browse/KYLIN-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330218#comment-15330218 ] Richard Calaba commented on KYLIN-976: -- How should I understand that this Issue is Resolved ??? I was trying to find the way how Custom Measure Calculation can be specified for cube/model ... but I didin't find any solution in Kylin 1.5.x on UI. The only solution I see is to use Hive View on Fact table where I can put custom calculations and use this View as new Fact Table ... Am I missing something ??? > Support Custom Aggregation Types > > > Key: KYLIN-976 > URL: https://issues.apache.org/jira/browse/KYLIN-976 > Project: Kylin > Issue Type: New Feature > Components: Job Engine, Query Engine >Reporter: Luke Han >Assignee: hongbin ma > Fix For: v1.5.0, v1.3.0 > > > Currently, Kylin supports 6 basic aggregation measure functions: > Min/Max/Sum/Count/Avg/DistinctCount > But there are also many other cases require to support more complicate > measure expression, for example: > COUNT(CASE WHEN so.ft = 'fv' THEN soi.sc ELSE NULL END) or Sum(if...) > Or even more complicated measures like TopN and RawRecords > To open this JIRA to tracking further implementation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KYLIN-1785) NoSuchElementException when Mandatory Dimensions contains all Dimensions
[ https://issues.apache.org/jira/browse/KYLIN-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hongbin ma reassigned KYLIN-1785: - Assignee: hongbin ma > NoSuchElementException when Mandatory Dimensions contains all Dimensions > > > Key: KYLIN-1785 > URL: https://issues.apache.org/jira/browse/KYLIN-1785 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.1 >Reporter: guohuili >Assignee: hongbin ma > > When {{Mandatory Dimensions}} included all dimensions in the Aggregation > Groups, {{NoSuchElementException}} will thrown in {{Build N-Dimension Cuboid > Data}} step(or {{Build Cube}} steps if in-mem cubing): > {code} > 2016-06-13 11:46:13,842 INFO [main] org.apache.kylin.dict.DictionaryManager: > DictionaryManager(1419528284) loading DictionaryInfo(loadDictObj:true) at > /dict/FUJIAN.HTTP_10T_PARTITION/MODEL_NAME/a1437b13-e7f6-49dc-bad7-232f80535f9a.dict > 2016-06-13 11:46:13,847 INFO [main] org.apache.hadoop.mapred.MapTask: > Starting flush of map output > 2016-06-13 11:46:13,918 INFO [main] org.apache.hadoop.io.compress.CodecPool: > Got brand-new compressor [.snappy] > 2016-06-13 11:46:14,895 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.util.NoSuchElementException > at java.util.ArrayList$Itr.next(ArrayList.java:834) > at java.util.Collections.min(Collections.java:665) > at > org.apache.kylin.cube.cuboid.Cuboid.translateToValidCuboid(Cuboid.java:201) > at > org.apache.kylin.cube.cuboid.Cuboid.translateToValidCuboid(Cuboid.java:125) > at org.apache.kylin.cube.cuboid.Cuboid.findById(Cuboid.java:67) > at > org.apache.kylin.engine.mr.steps.NDCuboidMapper.map(NDCuboidMapper.java:148) > at > org.apache.kylin.engine.mr.steps.NDCuboidMapper.map(NDCuboidMapper.java:49) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1793) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > 2016-06-13 11:46:15,934 INFO [main] org.apache.hadoop.mapred.Task: Runnning > cleanup for the task > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1785) NoSuchElementException when Mandatory Dimensions contains all Dimensions
[ https://issues.apache.org/jira/browse/KYLIN-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guohuili updated KYLIN-1785: Description: When {{Mandatory Dimensions}} included all dimensions in the Aggregation Groups, NoSuchElementException will thrown in Build N-Dimension Cuboid Data step(or Build Cube steps if in-mem cubing): {code} 2016-06-13 11:46:13,842 INFO [main] org.apache.kylin.dict.DictionaryManager: DictionaryManager(1419528284) loading DictionaryInfo(loadDictObj:true) at /dict/FUJIAN.HTTP_10T_PARTITION/MODEL_NAME/a1437b13-e7f6-49dc-bad7-232f80535f9a.dict 2016-06-13 11:46:13,847 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output 2016-06-13 11:46:13,918 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.snappy] 2016-06-13 11:46:14,895 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.util.NoSuchElementException at java.util.ArrayList$Itr.next(ArrayList.java:834) at java.util.Collections.min(Collections.java:665) at org.apache.kylin.cube.cuboid.Cuboid.translateToValidCuboid(Cuboid.java:201) at org.apache.kylin.cube.cuboid.Cuboid.translateToValidCuboid(Cuboid.java:125) at org.apache.kylin.cube.cuboid.Cuboid.findById(Cuboid.java:67) at org.apache.kylin.engine.mr.steps.NDCuboidMapper.map(NDCuboidMapper.java:148) at org.apache.kylin.engine.mr.steps.NDCuboidMapper.map(NDCuboidMapper.java:49) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1793) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 2016-06-13 11:46:15,934 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task {code} was: When {{Mandatory Dimensions}} included all dimensions in the Aggregation Groups, NoSuchElementException will thrown in Build N-Dimension Cuboid Data step(or Build Cube steps if in-mem cubing): > NoSuchElementException when Mandatory Dimensions contains all Dimensions > > > Key: KYLIN-1785 > URL: https://issues.apache.org/jira/browse/KYLIN-1785 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2 >Reporter: guohuili > > When {{Mandatory Dimensions}} included all dimensions in the Aggregation > Groups, NoSuchElementException will thrown in Build N-Dimension Cuboid Data > step(or Build Cube steps if in-mem cubing): > {code} > 2016-06-13 11:46:13,842 INFO [main] org.apache.kylin.dict.DictionaryManager: > DictionaryManager(1419528284) loading DictionaryInfo(loadDictObj:true) at > /dict/FUJIAN.HTTP_10T_PARTITION/MODEL_NAME/a1437b13-e7f6-49dc-bad7-232f80535f9a.dict > 2016-06-13 11:46:13,847 INFO [main] org.apache.hadoop.mapred.MapTask: > Starting flush of map output > 2016-06-13 11:46:13,918 INFO [main] org.apache.hadoop.io.compress.CodecPool: > Got brand-new compressor [.snappy] > 2016-06-13 11:46:14,895 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.util.NoSuchElementException > at java.util.ArrayList$Itr.next(ArrayList.java:834) > at java.util.Collections.min(Collections.java:665) > at > org.apache.kylin.cube.cuboid.Cuboid.translateToValidCuboid(Cuboid.java:201) > at > org.apache.kylin.cube.cuboid.Cuboid.translateToValidCuboid(Cuboid.java:125) > at org.apache.kylin.cube.cuboid.Cuboid.findById(Cuboid.java:67) > at > org.apache.kylin.engine.mr.steps.NDCuboidMapper.map(NDCuboidMapper.java:148) > at > org.apache.kylin.engine.mr.steps.NDCuboidMapper.map(NDCuboidMapper.java:49) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1793) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > 2016-06-13 11:46:15,934 INFO [main] org.apache.hadoop.mapred.Task: Runnning > cleanup for the task > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KYLIN-1785) NoSuchElementException when Mandatory Dimensions contains all Dimensions
[ https://issues.apache.org/jira/browse/KYLIN-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guohuili updated KYLIN-1785: Description: When {{Mandatory Dimensions}} included all dimensions in the Aggregation Groups, {{NoSuchElementException}} will thrown in {{Build N-Dimension Cuboid Data}} step(or {{Build Cube}} steps if in-mem cubing): {code} 2016-06-13 11:46:13,842 INFO [main] org.apache.kylin.dict.DictionaryManager: DictionaryManager(1419528284) loading DictionaryInfo(loadDictObj:true) at /dict/FUJIAN.HTTP_10T_PARTITION/MODEL_NAME/a1437b13-e7f6-49dc-bad7-232f80535f9a.dict 2016-06-13 11:46:13,847 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output 2016-06-13 11:46:13,918 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.snappy] 2016-06-13 11:46:14,895 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.util.NoSuchElementException at java.util.ArrayList$Itr.next(ArrayList.java:834) at java.util.Collections.min(Collections.java:665) at org.apache.kylin.cube.cuboid.Cuboid.translateToValidCuboid(Cuboid.java:201) at org.apache.kylin.cube.cuboid.Cuboid.translateToValidCuboid(Cuboid.java:125) at org.apache.kylin.cube.cuboid.Cuboid.findById(Cuboid.java:67) at org.apache.kylin.engine.mr.steps.NDCuboidMapper.map(NDCuboidMapper.java:148) at org.apache.kylin.engine.mr.steps.NDCuboidMapper.map(NDCuboidMapper.java:49) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1793) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 2016-06-13 11:46:15,934 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task {code} was: When {{Mandatory Dimensions}} included all dimensions in the Aggregation Groups, NoSuchElementException will thrown in Build N-Dimension Cuboid Data step(or Build Cube steps if in-mem cubing): {code} 2016-06-13 11:46:13,842 INFO [main] org.apache.kylin.dict.DictionaryManager: DictionaryManager(1419528284) loading DictionaryInfo(loadDictObj:true) at /dict/FUJIAN.HTTP_10T_PARTITION/MODEL_NAME/a1437b13-e7f6-49dc-bad7-232f80535f9a.dict 2016-06-13 11:46:13,847 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output 2016-06-13 11:46:13,918 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.snappy] 2016-06-13 11:46:14,895 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.util.NoSuchElementException at java.util.ArrayList$Itr.next(ArrayList.java:834) at java.util.Collections.min(Collections.java:665) at org.apache.kylin.cube.cuboid.Cuboid.translateToValidCuboid(Cuboid.java:201) at org.apache.kylin.cube.cuboid.Cuboid.translateToValidCuboid(Cuboid.java:125) at org.apache.kylin.cube.cuboid.Cuboid.findById(Cuboid.java:67) at org.apache.kylin.engine.mr.steps.NDCuboidMapper.map(NDCuboidMapper.java:148) at org.apache.kylin.engine.mr.steps.NDCuboidMapper.map(NDCuboidMapper.java:49) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1793) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 2016-06-13 11:46:15,934 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task {code} > NoSuchElementException when Mandatory Dimensions contains all Dimensions > > > Key: KYLIN-1785 > URL: https://issues.apache.org/jira/browse/KYLIN-1785 > Project: Kylin > Issue Type: Bug >Affects Versions: v1.5.2 >Reporter: guohuili > > When {{Mandatory Dimensions}} included all dimensions in the Aggregation > Groups, {{NoSuchElementException}} will thrown in {{Build N-Dimension Cuboid > Data}} step(or {{Build Cube}} steps if in-mem cubing): > {code} > 2016-06-13 11:46:13,842 INFO [main] org.apache.kylin.dict.DictionaryManager: > DictionaryManager(1419528284) loading
[jira] [Created] (KYLIN-1785) NoSuchElementException when Mandatory Dimensions contains all Dimensions
guohuili created KYLIN-1785: --- Summary: NoSuchElementException when Mandatory Dimensions contains all Dimensions Key: KYLIN-1785 URL: https://issues.apache.org/jira/browse/KYLIN-1785 Project: Kylin Issue Type: Bug Affects Versions: v1.5.2 Reporter: guohuili When {{Mandatory Dimensions}} included all dimensions in the Aggregation Groups, NoSuchElementException will thrown in Build N-Dimension Cuboid Data step(or Build Cube steps if in-mem cubing): -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KYLIN-1760) org.apache.hadoop.hbase.TableNotFoundException: kylin_metadata_user
[ https://issues.apache.org/jira/browse/KYLIN-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyang resolved KYLIN-1760. --- Resolution: Fixed Fix Version/s: (was: Future) v1.5.3 > org.apache.hadoop.hbase.TableNotFoundException: kylin_metadata_user > --- > > Key: KYLIN-1760 > URL: https://issues.apache.org/jira/browse/KYLIN-1760 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Affects Versions: v1.5.1 >Reporter: jiangshouzhuang >Assignee: liyang > Fix For: v1.5.3 > > > When I use "http://ipaddress:7070/kylin/query; to save queries, I notice the > log file kylin.log which report some errors : > org.apache.hadoop.hbase.TableNotFoundException: Table 'kylin_metadata_user' > was not found, got: kylin_metadata. > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1275) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1156) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1140) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1097) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:932) > at > org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:83) > at > org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:79) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:889) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:855) > at > org.apache.kylin.rest.service.QueryService.getQueries(QueryService.java:189) > at > org.apache.kylin.rest.service.QueryService.saveQuery(QueryService.java:129) > at > org.apache.kylin.rest.service.QueryService$$FastClassByCGLIB$$4957273f.invoke() > at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) > at > org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:618) > at > org.apache.kylin.rest.service.QueryService$$EnhancerByCGLIB$$67382972.saveQuery() > at > org.apache.kylin.rest.controller.QueryController.saveQuery(QueryController.java:110) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:213) > at > org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:126) > at > org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:96) > at > org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:617) > at > org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:578) > By anaylzing the source codes of Kylin, I see the file > apache-kylin-1.5.2\server\src\main\java\org\apache\kylin\rest\service\QueryService.java, > so I create the table "kylin_metadata_user" in the HBase. > hbase(main):001:0> create 'kylin_metadata_user','q' > 0 row(s) in 1.5880 seconds > => Hbase::Table - kylin_metadata_user > But I think that it maybe a bug. Thank you. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1590) 2 Kylin Steaming merge jobs of same time range triggered and failed
[ https://issues.apache.org/jira/browse/KYLIN-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329350#comment-15329350 ] Wang Ken commented on KYLIN-1590: - Hi, Yang The duplicated merge jobs with same range are triggered by unexpected call sequence to CubeManager.updateCube() by multiple threads(job thread and http threads). Thanks Ken > 2 Kylin Steaming merge jobs of same time range triggered and failed > > > Key: KYLIN-1590 > URL: https://issues.apache.org/jira/browse/KYLIN-1590 > Project: Kylin > Issue Type: Bug > Components: streaming >Affects Versions: v1.4.0 >Reporter: qianqiaoneng >Assignee: Zhong Yanghong >Priority: Critical > > 2 issues: > 1. Kylin allows 2 merge jobs with same time range running. > 2. when 2 merge jobs with same time range are running on the same time, they > mixed up metadata, always get the HTable not found error. > Build Result of Job site_gmb - 20160415212000_20160415215000 - MERGE - PDT > 2016-04-15 14:58:38 > Build Result: ERROR > Job Engine: *** > Cube Name: site_gmb > Source Records Count: 0 > Start Time: Fri Apr 15 14:58:44 PDT 2016 > Duration: 2mins > MR Waiting: 0mins > Last Update Time: Fri Apr 15 15:01:42 PDT 2016 > Submitter: SYSTEM > Error Log: org.apache.hadoop.hbase.TableNotFoundException: KYLIN_NB2J0SRADJ > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1299) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1128) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1070) > at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:347) > at org.apache.hadoop.hbase.client.HTable.(HTable.java:201) > at org.apache.hadoop.hbase.client.HTable.(HTable.java:159) > at > org.apache.kylin.storage.hbase.steps.CubeHFileJob.run(CubeHFileJob.java:87) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:118) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:105) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:105) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1590) 2 Kylin Steaming merge jobs of same time range triggered and failed
[ https://issues.apache.org/jira/browse/KYLIN-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329340#comment-15329340 ] Wang Ken commented on KYLIN-1590: - It's caused by multi-threads issue in CubeManager.updateCubeWithRetry. Different threads could run into this method and update the cube instance. The HBase resource update and cubeMap(cache) update should be atomic. The fix is to add a lock to the cube instance. It could hold multiple locks in case of retry but there is no dead lock risk . It always follow the order to get lock on the stale cube then a fresh cube. CubeInstance _cube = null; synchronized(cube){ try { getStore().putResource(cube.getResourcePath(), cube, CUBE_SERIALIZER); } catch (IllegalStateException ise) { logger.warn("Write conflict to update cube " + cube.getName() + " at try " + retry + ", will retry..."); if (retry >= 7) { logger.error("Retried 7 times till got error, abandoning...", ise); throw ise; } update.setCubeInstance(reloadCubeLocal(cube.getName())); retry++; _cube = updateCubeWithRetry(update, retry); } if(_cube != null){ cube = _cube; }else{ cubeMap.put(cube.getName(), cube); } } > 2 Kylin Steaming merge jobs of same time range triggered and failed > > > Key: KYLIN-1590 > URL: https://issues.apache.org/jira/browse/KYLIN-1590 > Project: Kylin > Issue Type: Bug > Components: streaming >Affects Versions: v1.4.0 >Reporter: qianqiaoneng >Assignee: Zhong Yanghong >Priority: Critical > > 2 issues: > 1. Kylin allows 2 merge jobs with same time range running. > 2. when 2 merge jobs with same time range are running on the same time, they > mixed up metadata, always get the HTable not found error. > Build Result of Job site_gmb - 20160415212000_20160415215000 - MERGE - PDT > 2016-04-15 14:58:38 > Build Result: ERROR > Job Engine: *** > Cube Name: site_gmb > Source Records Count: 0 > Start Time: Fri Apr 15 14:58:44 PDT 2016 > Duration: 2mins > MR Waiting: 0mins > Last Update Time: Fri Apr 15 15:01:42 PDT 2016 > Submitter: SYSTEM > Error Log: org.apache.hadoop.hbase.TableNotFoundException: KYLIN_NB2J0SRADJ > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1299) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1128) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1070) > at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:347) > at org.apache.hadoop.hbase.client.HTable.(HTable.java:201) > at org.apache.hadoop.hbase.client.HTable.(HTable.java:159) > at > org.apache.kylin.storage.hbase.steps.CubeHFileJob.run(CubeHFileJob.java:87) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:118) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:105) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:105) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KYLIN-1756) Allow user to run MR jobs against different Hadoop queues
[ https://issues.apache.org/jira/browse/KYLIN-1756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329204#comment-15329204 ] Ma Gang commented on KYLIN-1756: already fixed in KYLIN-1706 > Allow user to run MR jobs against different Hadoop queues > - > > Key: KYLIN-1756 > URL: https://issues.apache.org/jira/browse/KYLIN-1756 > Project: Kylin > Issue Type: New Feature >Reporter: qianqiaoneng >Assignee: Ma Gang > Fix For: v1.5.3 > > > Currently Kylin provides a batch account b_kylin to run cube build MR jobs on > a centralized queue, which has some scalability issues. We need to extend the > capacity the centralized queue when more and more customers onboard to Kylin > server. Open this JIRA feature to track the feature of enabling Kylin user > with additional option to use different Hadoop queues per project instead of > the centralized queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)