[jira] [Commented] (OFBIZ-10953) have CurrencyDimension have a dimensionId that is based on the natural key
[ https://issues.apache.org/jira/browse/OFBIZ-10953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061573#comment-17061573 ] Pierre Smits commented on OFBIZ-10953: -- Patch available via https://github.com/apache/ofbiz-plugins/pull/11 See OFBIZ-11414 > have CurrencyDimension have a dimensionId that is based on the natural key > -- > > Key: OFBIZ-10953 > URL: https://issues.apache.org/jira/browse/OFBIZ-10953 > Project: OFBiz > Issue Type: Sub-task > Components: bi >Affects Versions: Release Branch 18.12, Release Branch 17.12, Trunk >Reporter: Pierre Smits >Assignee: Pierre Smits >Priority: Major > Labels: CurrencyDimension, birt, currency, dimension, dwh > Attachments: 20200208_094840.jpg > > > Currently the record sequencer (delegator.getNextSeqId) is used to determine > the dimensionId for the CurrencyDimension. This is unnecessary as the uomId > from the UOM table can be used for currency. > It also makes it easier to set the foreign-key in fact tables by generating > it based on the date provided, than by retrieving the dimensionId based on a > retrieval through the getDimensionIdFromNaturalKey service. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OFBIZ-10953) have CurrencyDimension have a dimensionId that is based on the natural key
[ https://issues.apache.org/jira/browse/OFBIZ-10953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035131#comment-17035131 ] Pierre Smits commented on OFBIZ-10953: -- Ok. Thanks. > have CurrencyDimension have a dimensionId that is based on the natural key > -- > > Key: OFBIZ-10953 > URL: https://issues.apache.org/jira/browse/OFBIZ-10953 > Project: OFBiz > Issue Type: Improvement > Components: bi >Affects Versions: Trunk, Release Branch 17.12, Release Branch 18.12 >Reporter: Pierre Smits >Assignee: Pierre Smits >Priority: Major > Labels: CurrencyDimension, birt, currency, dimension, dwh > Attachments: 20200208_094840.jpg, OFBIZ-10953-BI.patch > > > Currently the record sequencer (delegator.getNextSeqId) is used to determine > the dimensionId for the CurrencyDimension. This is unnecessary as the uomId > from the UOM table can be used for currency. > It also makes it easier to set the foreign-key in fact tables by generating > it based on the date provided, than by retrieving the dimensionId based on a > retrieval through the getDimensionIdFromNaturalKey service. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OFBIZ-10953) have CurrencyDimension have a dimensionId that is based on the natural key
[ https://issues.apache.org/jira/browse/OFBIZ-10953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035108#comment-17035108 ] Jacopo Cappellato commented on OFBIZ-10953: --- Hi [~pierresmits], this is not what I was trying to convey; quoting myself: {quote}In my opinion, rather than relaxing the "surrogate key" rule for dimension tables by introducing exceptions to it, it would be better to improve the implementation of surrogate keys in our BI component {quote} So, just to be super clear to avoid that my comment is misinterpreted: * I didn't say that this *needs* to be done before *anything is added to the BI component* * I was not talking about *corrections* that are required: the current code is not broken or wrong, I see this just as an opportunity for an improvement * this enhancement would involve the implementation of a new method, db independent, to define/generate numeric sequences: so it is not a matter of *correcting* field type or entity engine definitions * I mentioned this enhancement in this context because we were talking about the best practices around "surrogate keys" from the DWHT book and this is something that was in my mind since long: however it is off topic for this ticket, nor it is blocking any enhancements to be contributed to the BI component, and it was not my intention to shift the conversation away from the topic > have CurrencyDimension have a dimensionId that is based on the natural key > -- > > Key: OFBIZ-10953 > URL: https://issues.apache.org/jira/browse/OFBIZ-10953 > Project: OFBiz > Issue Type: Improvement > Components: bi >Affects Versions: Trunk, Release Branch 17.12, Release Branch 18.12 >Reporter: Pierre Smits >Assignee: Pierre Smits >Priority: Major > Labels: CurrencyDimension, birt, currency, dimension, dwh > Attachments: 20200208_094840.jpg, OFBIZ-10953-BI.patch > > > Currently the record sequencer (delegator.getNextSeqId) is used to determine > the dimensionId for the CurrencyDimension. This is unnecessary as the uomId > from the UOM table can be used for currency. > It also makes it easier to set the foreign-key in fact tables by generating > it based on the date provided, than by retrieving the dimensionId based on a > retrieval through the getDimensionIdFromNaturalKey service. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OFBIZ-10953) have CurrencyDimension have a dimensionId that is based on the natural key
[ https://issues.apache.org/jira/browse/OFBIZ-10953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17034591#comment-17034591 ] Pierre Smits commented on OFBIZ-10953: -- Hi [~jacopoc], So you're saying that before anything is added to the BI component, the field type definitions need to be corrected for it (in line with the pre-bigdata dogma, and before paradigm shifts involving Data Lakes and Data Vaults), plus the entity engine xml file needs to be corrected accordingly? > have CurrencyDimension have a dimensionId that is based on the natural key > -- > > Key: OFBIZ-10953 > URL: https://issues.apache.org/jira/browse/OFBIZ-10953 > Project: OFBiz > Issue Type: Improvement > Components: bi >Affects Versions: Trunk, Release Branch 17.12, Release Branch 18.12 >Reporter: Pierre Smits >Assignee: Pierre Smits >Priority: Major > Labels: CurrencyDimension, birt, currency, dimension, dwh > Attachments: 20200208_094840.jpg, OFBIZ-10953-BI.patch > > > Currently the record sequencer (delegator.getNextSeqId) is used to determine > the dimensionId for the CurrencyDimension. This is unnecessary as the uomId > from the UOM table can be used for currency. > It also makes it easier to set the foreign-key in fact tables by generating > it based on the date provided, than by retrieving the dimensionId based on a > retrieval through the getDimensionIdFromNaturalKey service. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OFBIZ-10953) have CurrencyDimension have a dimensionId that is based on the natural key
[ https://issues.apache.org/jira/browse/OFBIZ-10953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032859#comment-17032859 ] Jacopo Cappellato commented on OFBIZ-10953: --- I know that in the third edition of the book the author introduced the possibility of an exception for the Date dimension: {quote}A dimension table is designed with one column serving as a unique primary key. The primary key cannot be the operational system's natural key [...] These dimension surrogate keys are simple integers, assigned in sequence, starting with the value 1, every time a new key is needed. The date dimension is exempt from the surrogate key rule; this highly predictable and stable dimension can use a more meaningful primary key [...]. {quote} However it is made clear that this exception should not be a shortcut to provide meaning to date fields in fact tables and instead can be introduced: {quote}*To facilitate partitioning*, the primary key of a date dimension can be more meaningful, such as an integer representing MMDD, instead of sequentially-assigned surrogate key. {quote} With that said, the golden rule is always the same: {quote}Every join between dimension and fact tables in the data warehouse should be based on meaningless integer surrogate keys. You should avoid using the natural operational production codes. None of the data warehouse keys should be smart, where you can tell something about the row just by looking at the key. {quote} Pierre wrote: {quote}The latter means that in a production infrastructure the using company is penalised (performance and cost-wise) with an additional query (for the *origCurrencyDimId*) to the currency dimension to retrieve the underlying explanation/meaning (EUR). And similarly for the *quantityUomDimId* and other generically defined measurements (e.g. in the examples the *invoiceDateDimId*). {quote} No additional query is required to fetch, for example, date information; instead the star schema should include an additional join to link the fact table to the Date dimension table. This would not increase the number of records produces (which is upper limited by the rows and in fact table): star schemas are designed like this to perform best with single queries with several joins (from one fact to many dimensions). In my opinion, rather than relaxing the "surrogate key" rule for dimension tables by introducing exceptions to it, it would be better to improve the implementation of surrogate keys in our BI component: in fact they are not currently implemented as integer sequential numbers and are instead strings. Switching to sequential integers would greatly improve performance and shrink the size of many tables in the OLAP database. When the BI component was created, it was decided to use strings simply because it was a prototype (not intended for production), and since there is no support for a db independent way to generate integer sequences in OFBiz, we ended up using the available sequence generator utility for strings. > have CurrencyDimension have a dimensionId that is based on the natural key > -- > > Key: OFBIZ-10953 > URL: https://issues.apache.org/jira/browse/OFBIZ-10953 > Project: OFBiz > Issue Type: Improvement > Components: bi >Affects Versions: Trunk, Release Branch 17.12, Release Branch 18.12 >Reporter: Pierre Smits >Assignee: Pierre Smits >Priority: Major > Labels: CurrencyDimension, birt, currency, dimension, dwh > Attachments: OFBIZ-10953-BI.patch > > > Currently the record sequencer (delegator.getNextSeqId) is used to determine > the dimensionId for the CurrencyDimension. This is unnecessary as the uomId > from the UOM table can be used for currency. > It also makes it easier to set the foreign-key in fact tables by generating > it based on the date provided, than by retrieving the dimensionId based on a > retrieval through the getDimensionIdFromNaturalKey service. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OFBIZ-10953) have CurrencyDimension have a dimensionId that is based on the natural key
[ https://issues.apache.org/jira/browse/OFBIZ-10953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032320#comment-17032320 ] Pierre Smits commented on OFBIZ-10953: -- The reasoning we applied to having a different primary key for the Uom table in the transaction db (an exception to the primary key definition for majority of the tables) applies also to the DWH. For those not having access to the Data Warehouse Toolkit book (which is the starting point for the project's approach to Bi through the same named component) the first line states: ??Surrogate keys are used to implement the primary keys of *almost all* dimension tables.?? So, which dimension tables can the project consider to be the exceptions? These are the generic dimension tables that are uniform in meaning across businesses, e.g. those data sets defined by standardisation bodies. Measurement dimension tables, like date, time, but also country, currency and all those other groups of records which are relating to records in the Uom table in the transaction db are such generic tables. These tables don't need a 'self defined' primary keys added, because the natural key is self-explanatory. That is why Kimball in his book suggests for the Date dimension table to use the ISO 8601 (specifically YYMMDD) for the date as the definition for the primary key. And the same is suggested for the Time dimension table (HHMM in that case). The validation for these exceptions are: # the natural key is self-explanatory, # data abstraction negatively impacts query performance and legibility. Compare following examples: {code:java} {code} vs {code:java} {code} The latter means that in a production infrastructure the using company is penalised (performance and cost-wise) with an additional query (for the *origCurrencyDimId*) to the currency dimension to retrieve the underlying explanation/meaning (EUR). And similarly for the *quantityUomDimId* and other generically defined measurements (e.g. in the examples the *invoiceDateDimId*). This may not mean much in DHW implementations with a low number of records in fact tables. But when there a multitude of such fact tables with each having millions/billions of records in those tables using such keys of generic tables (these measurement tables), then it adds up significantly. > have CurrencyDimension have a dimensionId that is based on the natural key > -- > > Key: OFBIZ-10953 > URL: https://issues.apache.org/jira/browse/OFBIZ-10953 > Project: OFBiz > Issue Type: Improvement > Components: bi >Affects Versions: Trunk, Release Branch 17.12, Release Branch 18.12 >Reporter: Pierre Smits >Assignee: Pierre Smits >Priority: Major > Labels: CurrencyDimension, birt, currency, dimension, dwh > Attachments: OFBIZ-10953-BI.patch > > > Currently the record sequencer (delegator.getNextSeqId) is used to determine > the dimensionId for the CurrencyDimension. This is unnecessary as the uomId > from the UOM table can be used for currency. > It also makes it easier to set the foreign-key in fact tables by generating > it based on the date provided, than by retrieving the dimensionId based on a > retrieval through the getDimensionIdFromNaturalKey service. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OFBIZ-10953) have CurrencyDimension have a dimensionId that is based on the natural key
[ https://issues.apache.org/jira/browse/OFBIZ-10953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031607#comment-17031607 ] Jacopo Cappellato commented on OFBIZ-10953: --- The many reasons to use surrogate keys as primary keys in dimension tables instead of natural keys are well articulated in the section titled "Surrogate Keys" at page 58 of the DWH Toolkit book. We can always consider to adopt exceptions to these rules if there are strong reasons for doing this, but I don't see them here and also, in my opinion, it is better to postpone exceptions when they are really required and instead implement the same, consistent, simple and clear patterns for all the dimensions since we are still building the foundations of this OLAP data model in the BI component and future contributors will use this code as templates to build new dimensions and facts. > have CurrencyDimension have a dimensionId that is based on the natural key > -- > > Key: OFBIZ-10953 > URL: https://issues.apache.org/jira/browse/OFBIZ-10953 > Project: OFBiz > Issue Type: Improvement > Components: bi >Affects Versions: Trunk, Release Branch 17.12, Release Branch 18.12 >Reporter: Pierre Smits >Assignee: Pierre Smits >Priority: Major > Labels: CurrencyDimension, birt, currency, dimension, dwh > Attachments: OFBIZ-10953-BI.patch > > > Currently the record sequencer (delegator.getNextSeqId) is used to determine > the dimensionId for the CurrencyDimension. This is unnecessary as the uomId > from the UOM table can be used for currency. > It also makes it easier to set the foreign-key in fact tables by generating > it based on the date provided, than by retrieving the dimensionId based on a > retrieval through the getDimensionIdFromNaturalKey service. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OFBIZ-10953) have CurrencyDimension have a dimensionId that is based on the natural key
[ https://issues.apache.org/jira/browse/OFBIZ-10953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031527#comment-17031527 ] Pierre Smits commented on OFBIZ-10953: -- Currency definitions are not defined by the project, but rather by the ISO (see 1). And based on publicly available information(this may take some time to find) the meaning of a currency code does not change over time. As you can see in the referred webpage there have been several updates to the currency of Zimbabwe (formerly known as Rodesia), where each new currency got a new code while the name stayed the same. We can therefore safely assume/expect that over the lifespan of a country-currency relationship the currency stays unique. We thus don't need to worry about multiple records (both in the transaction database in the Uom table) as in the dwh in the currency dimensionTable) with duplicate natural keys. As a fact: this is not possible because the natural key can not exist more than once in the source table (UoM). In DWH and BI circles it is very common to use the natural key in dimension and fact table for currencies (as opposed to the business-own generated sequencedId for other transaction tables), as it reduces the need to have additional effort to set up - in star schemas and fact tables - the relation between the (sequenced) dimensionId and what the common meaning is (the natural key or currency code). Even with regards to the adopted load/update strategies (type 1, type 2, etc) the DWH toolkit book does not dictate that values in the dimensionId field of dimension tables must be of the sequence type. [1] https://en.wikipedia.org/wiki/ISO_4217 > have CurrencyDimension have a dimensionId that is based on the natural key > -- > > Key: OFBIZ-10953 > URL: https://issues.apache.org/jira/browse/OFBIZ-10953 > Project: OFBiz > Issue Type: Improvement > Components: bi >Affects Versions: Trunk, Release Branch 17.12, Release Branch 18.12 >Reporter: Pierre Smits >Assignee: Pierre Smits >Priority: Major > Labels: CurrencyDimension, birt, currency, dimension, dwh > Attachments: OFBIZ-10953-BI.patch > > > Currently the record sequencer (delegator.getNextSeqId) is used to determine > the dimensionId for the CurrencyDimension. This is unnecessary as the uomId > from the UOM table can be used for currency. > It also makes it easier to set the foreign-key in fact tables by generating > it based on the date provided, than by retrieving the dimensionId based on a > retrieval through the getDimensionIdFromNaturalKey service. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (OFBIZ-10953) have CurrencyDimension have a dimensionId that is based on the natural key
[ https://issues.apache.org/jira/browse/OFBIZ-10953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031495#comment-17031495 ] Jacopo Cappellato commented on OFBIZ-10953: --- Unfortunately it is not desirable to use uomId (i.e. the natural key) to populate the dimensionId (i.e. the primary key of the dimension table): in fact, based on the update strategy adopted for the dimension table (type 1, type 2 etc... as described in "The Datawarehouse Toolkit" book) the CurrencyDimension table may contain several rows with the same uomId as this is important for properly implementing drill-up and drill-down reports. > have CurrencyDimension have a dimensionId that is based on the natural key > -- > > Key: OFBIZ-10953 > URL: https://issues.apache.org/jira/browse/OFBIZ-10953 > Project: OFBiz > Issue Type: Improvement > Components: bi >Affects Versions: Trunk, Release Branch 17.12, Release Branch 18.12 >Reporter: Pierre Smits >Assignee: Pierre Smits >Priority: Major > Labels: CurrencyDimension, birt, currency, dimension, dwh > Attachments: OFBIZ-10953-BI.patch > > > Currently the record sequencer (delegator.getNextSeqId) is used to determine > the dimensionId for the CurrencyDimension. This is unnecessary as the uomId > from the UOM table can be used for currency. > It also makes it easier to set the foreign-key in fact tables by generating > it based on the date provided, than by retrieving the dimensionId based on a > retrieval through the getDimensionIdFromNaturalKey service. -- This message was sent by Atlassian Jira (v8.3.4#803005)