[jira] [Commented] (OFBIZ-10953) have CurrencyDimension have a dimensionId that is based on the natural key

2020-03-18 Thread Pierre Smits (Jira)


[ 
https://issues.apache.org/jira/browse/OFBIZ-10953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061573#comment-17061573
 ] 

Pierre Smits commented on OFBIZ-10953:
--

Patch available via https://github.com/apache/ofbiz-plugins/pull/11 See 
OFBIZ-11414

> have CurrencyDimension have a dimensionId that is based on the natural key
> --
>
> Key: OFBIZ-10953
> URL: https://issues.apache.org/jira/browse/OFBIZ-10953
> Project: OFBiz
>  Issue Type: Sub-task
>  Components: bi
>Affects Versions: Release Branch 18.12, Release Branch 17.12, Trunk
>Reporter: Pierre Smits
>Assignee: Pierre Smits
>Priority: Major
>  Labels: CurrencyDimension, birt, currency, dimension, dwh
> Attachments: 20200208_094840.jpg
>
>
> Currently the record sequencer (delegator.getNextSeqId) is used to determine 
> the dimensionId for the CurrencyDimension. This is unnecessary as the uomId 
> from the UOM table can be used for currency.
> It also makes it easier to set the foreign-key in fact tables by generating 
> it based on the date provided, than by retrieving the dimensionId based on a 
> retrieval through the getDimensionIdFromNaturalKey service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OFBIZ-10953) have CurrencyDimension have a dimensionId that is based on the natural key

2020-02-12 Thread Pierre Smits (Jira)


[ 
https://issues.apache.org/jira/browse/OFBIZ-10953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035131#comment-17035131
 ] 

Pierre Smits commented on OFBIZ-10953:
--

Ok. Thanks.

> have CurrencyDimension have a dimensionId that is based on the natural key
> --
>
> Key: OFBIZ-10953
> URL: https://issues.apache.org/jira/browse/OFBIZ-10953
> Project: OFBiz
>  Issue Type: Improvement
>  Components: bi
>Affects Versions: Trunk, Release Branch 17.12, Release Branch 18.12
>Reporter: Pierre Smits
>Assignee: Pierre Smits
>Priority: Major
>  Labels: CurrencyDimension, birt, currency, dimension, dwh
> Attachments: 20200208_094840.jpg, OFBIZ-10953-BI.patch
>
>
> Currently the record sequencer (delegator.getNextSeqId) is used to determine 
> the dimensionId for the CurrencyDimension. This is unnecessary as the uomId 
> from the UOM table can be used for currency.
> It also makes it easier to set the foreign-key in fact tables by generating 
> it based on the date provided, than by retrieving the dimensionId based on a 
> retrieval through the getDimensionIdFromNaturalKey service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OFBIZ-10953) have CurrencyDimension have a dimensionId that is based on the natural key

2020-02-11 Thread Jacopo Cappellato (Jira)


[ 
https://issues.apache.org/jira/browse/OFBIZ-10953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035108#comment-17035108
 ] 

Jacopo Cappellato commented on OFBIZ-10953:
---

Hi [~pierresmits],

this is not what I was trying to convey; quoting myself:
{quote}In my opinion, rather than relaxing the "surrogate key" rule for 
dimension tables by introducing exceptions to it, it would be better to improve 
the implementation of surrogate keys in our BI component
{quote}
So, just to be super clear to avoid that my comment is misinterpreted:
 * I didn't say that this *needs* to be done before *anything is added to the 
BI component*
 * I was not talking about *corrections* that are required: the current code is 
not broken or wrong, I see this just as an opportunity for an improvement
 * this enhancement would involve the implementation of a new method, db 
independent, to define/generate numeric sequences: so it is not a matter of 
*correcting* field type or entity engine definitions
 * I mentioned this enhancement in this context because we were talking about 
the best practices around "surrogate keys" from the DWHT book and this is 
something that was in my mind since long: however it is off topic for this 
ticket, nor it is blocking any enhancements to be contributed to the BI 
component, and it was not my intention to shift the conversation away from the 
topic

> have CurrencyDimension have a dimensionId that is based on the natural key
> --
>
> Key: OFBIZ-10953
> URL: https://issues.apache.org/jira/browse/OFBIZ-10953
> Project: OFBiz
>  Issue Type: Improvement
>  Components: bi
>Affects Versions: Trunk, Release Branch 17.12, Release Branch 18.12
>Reporter: Pierre Smits
>Assignee: Pierre Smits
>Priority: Major
>  Labels: CurrencyDimension, birt, currency, dimension, dwh
> Attachments: 20200208_094840.jpg, OFBIZ-10953-BI.patch
>
>
> Currently the record sequencer (delegator.getNextSeqId) is used to determine 
> the dimensionId for the CurrencyDimension. This is unnecessary as the uomId 
> from the UOM table can be used for currency.
> It also makes it easier to set the foreign-key in fact tables by generating 
> it based on the date provided, than by retrieving the dimensionId based on a 
> retrieval through the getDimensionIdFromNaturalKey service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OFBIZ-10953) have CurrencyDimension have a dimensionId that is based on the natural key

2020-02-11 Thread Pierre Smits (Jira)


[ 
https://issues.apache.org/jira/browse/OFBIZ-10953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17034591#comment-17034591
 ] 

Pierre Smits commented on OFBIZ-10953:
--

Hi [~jacopoc],

So you're saying that before anything is added to the BI component, the field 
type definitions need to be corrected for it (in line with the pre-bigdata 
dogma, and before paradigm shifts involving Data Lakes and Data Vaults), plus 
the entity engine xml file needs to be corrected accordingly?

> have CurrencyDimension have a dimensionId that is based on the natural key
> --
>
> Key: OFBIZ-10953
> URL: https://issues.apache.org/jira/browse/OFBIZ-10953
> Project: OFBiz
>  Issue Type: Improvement
>  Components: bi
>Affects Versions: Trunk, Release Branch 17.12, Release Branch 18.12
>Reporter: Pierre Smits
>Assignee: Pierre Smits
>Priority: Major
>  Labels: CurrencyDimension, birt, currency, dimension, dwh
> Attachments: 20200208_094840.jpg, OFBIZ-10953-BI.patch
>
>
> Currently the record sequencer (delegator.getNextSeqId) is used to determine 
> the dimensionId for the CurrencyDimension. This is unnecessary as the uomId 
> from the UOM table can be used for currency.
> It also makes it easier to set the foreign-key in fact tables by generating 
> it based on the date provided, than by retrieving the dimensionId based on a 
> retrieval through the getDimensionIdFromNaturalKey service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OFBIZ-10953) have CurrencyDimension have a dimensionId that is based on the natural key

2020-02-08 Thread Jacopo Cappellato (Jira)


[ 
https://issues.apache.org/jira/browse/OFBIZ-10953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032859#comment-17032859
 ] 

Jacopo Cappellato commented on OFBIZ-10953:
---

I know that in the third edition of the book the author introduced the 
possibility of an exception for the Date dimension:
{quote}A dimension table is designed with one column serving as a unique 
primary key. The primary key cannot be the operational system's natural key 
[...] These dimension surrogate keys are simple integers, assigned in sequence, 
starting with the value 1, every time a new key is needed. The date dimension 
is exempt from the surrogate key rule; this highly predictable and stable 
dimension can use a more meaningful primary key [...]. 
{quote}
However it is made clear that this exception should not be a shortcut to 
provide meaning to date fields in fact tables and instead can be introduced:
{quote}*To facilitate partitioning*, the primary key of a date dimension can be 
more meaningful, such as an integer representing MMDD, instead of 
sequentially-assigned surrogate key.
{quote}
With that said, the golden rule is always the same:
{quote}Every join between dimension and fact tables in the data warehouse 
should be based on meaningless integer surrogate keys. You should avoid using 
the natural operational production codes. None of the data warehouse keys 
should be smart, where you can tell something about the row just by looking at 
the key.
{quote}
Pierre wrote:
{quote}The latter means that in a production infrastructure the using company 
is penalised (performance and cost-wise) with an additional query (for the 
*origCurrencyDimId*) to the currency dimension to retrieve the underlying 
explanation/meaning (EUR). And similarly for the *quantityUomDimId* and other 
generically defined measurements (e.g. in the examples the  *invoiceDateDimId*).
{quote}
No additional query is required to fetch, for example, date information; 
instead the star schema should include an additional join to link the fact 
table to the Date dimension table. This would not increase the number of 
records produces (which is upper limited by the rows and in fact table): star 
schemas are designed like this to perform best with single queries with several 
joins (from one fact to many dimensions).

In my opinion, rather than relaxing the "surrogate key" rule for dimension 
tables by introducing exceptions to it, it would be better to improve the 
implementation of surrogate keys in our BI component: in fact they are not 
currently implemented as integer sequential numbers and are instead strings. 
Switching to sequential integers would greatly improve performance and shrink 
the size of many tables in the OLAP database. When the BI component was 
created, it was decided to use strings simply because it was a prototype (not 
intended for production), and since there is no support for a db independent 
way to generate integer sequences in OFBiz, we ended up using the available 
sequence generator utility for strings.

 

 

 

> have CurrencyDimension have a dimensionId that is based on the natural key
> --
>
> Key: OFBIZ-10953
> URL: https://issues.apache.org/jira/browse/OFBIZ-10953
> Project: OFBiz
>  Issue Type: Improvement
>  Components: bi
>Affects Versions: Trunk, Release Branch 17.12, Release Branch 18.12
>Reporter: Pierre Smits
>Assignee: Pierre Smits
>Priority: Major
>  Labels: CurrencyDimension, birt, currency, dimension, dwh
> Attachments: OFBIZ-10953-BI.patch
>
>
> Currently the record sequencer (delegator.getNextSeqId) is used to determine 
> the dimensionId for the CurrencyDimension. This is unnecessary as the uomId 
> from the UOM table can be used for currency.
> It also makes it easier to set the foreign-key in fact tables by generating 
> it based on the date provided, than by retrieving the dimensionId based on a 
> retrieval through the getDimensionIdFromNaturalKey service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OFBIZ-10953) have CurrencyDimension have a dimensionId that is based on the natural key

2020-02-07 Thread Pierre Smits (Jira)


[ 
https://issues.apache.org/jira/browse/OFBIZ-10953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032320#comment-17032320
 ] 

Pierre Smits commented on OFBIZ-10953:
--

The reasoning we applied to having a different primary key for the Uom table in 
the transaction db (an exception to the primary key definition for majority of 
the tables) applies also to the DWH.

For those not having access to the Data Warehouse Toolkit book (which is the 
starting point for the project's approach to Bi through the same named 
component) the first line states:

??Surrogate keys are used to implement the primary keys of *almost all* 
dimension tables.??

So, which dimension tables can the project consider to be the exceptions?
 These are the generic dimension tables that are uniform in meaning across 
businesses, e.g. those data sets defined by standardisation bodies. 
 Measurement dimension tables, like date, time, but also country, currency and 
all those other groups of records which are relating to records in the Uom 
table in the transaction db are such generic tables. These tables don't need a 
'self defined' primary keys added, because the natural key is self-explanatory.

That is why Kimball in his book suggests for the Date dimension table to use 
the ISO 8601 (specifically YYMMDD) for the date as the definition for the 
primary key. And the same is suggested for the Time dimension table (HHMM in 
that case).

The validation for these exceptions are:
 # the natural key is self-explanatory,
 # data abstraction negatively impacts query performance and legibility.

Compare following examples:
{code:java}

{code}
vs
{code:java}

{code}
The latter means that in a production infrastructure the using company is 
penalised (performance and cost-wise) with an additional query (for the 
*origCurrencyDimId*) to the currency dimension to retrieve the underlying 
explanation/meaning (EUR). And similarly for the *quantityUomDimId* and other 
generically defined measurements (e.g. in the examples the  *invoiceDateDimId*).

This may not mean much in DHW implementations with a low number of records in 
fact tables. But when there a multitude of such fact tables with each having 
millions/billions of records in those tables using such keys of generic tables 
(these measurement tables), then it adds up significantly. 

> have CurrencyDimension have a dimensionId that is based on the natural key
> --
>
> Key: OFBIZ-10953
> URL: https://issues.apache.org/jira/browse/OFBIZ-10953
> Project: OFBiz
>  Issue Type: Improvement
>  Components: bi
>Affects Versions: Trunk, Release Branch 17.12, Release Branch 18.12
>Reporter: Pierre Smits
>Assignee: Pierre Smits
>Priority: Major
>  Labels: CurrencyDimension, birt, currency, dimension, dwh
> Attachments: OFBIZ-10953-BI.patch
>
>
> Currently the record sequencer (delegator.getNextSeqId) is used to determine 
> the dimensionId for the CurrencyDimension. This is unnecessary as the uomId 
> from the UOM table can be used for currency.
> It also makes it easier to set the foreign-key in fact tables by generating 
> it based on the date provided, than by retrieving the dimensionId based on a 
> retrieval through the getDimensionIdFromNaturalKey service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OFBIZ-10953) have CurrencyDimension have a dimensionId that is based on the natural key

2020-02-06 Thread Jacopo Cappellato (Jira)


[ 
https://issues.apache.org/jira/browse/OFBIZ-10953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031607#comment-17031607
 ] 

Jacopo Cappellato commented on OFBIZ-10953:
---

The many reasons to use surrogate keys as primary keys in dimension tables 
instead of natural keys are well articulated in the section titled "Surrogate 
Keys" at page 58 of the DWH Toolkit book.

We can always consider to adopt exceptions to these rules if there are strong 
reasons for doing this, but I don't see them here and also, in my opinion, it 
is better to postpone exceptions when they are really required and instead 
implement the same, consistent, simple and clear patterns for all the 
dimensions since we are still building the foundations of this OLAP data model 
in the BI component and future contributors will use this code as templates to 
build new dimensions and facts.

> have CurrencyDimension have a dimensionId that is based on the natural key
> --
>
> Key: OFBIZ-10953
> URL: https://issues.apache.org/jira/browse/OFBIZ-10953
> Project: OFBiz
>  Issue Type: Improvement
>  Components: bi
>Affects Versions: Trunk, Release Branch 17.12, Release Branch 18.12
>Reporter: Pierre Smits
>Assignee: Pierre Smits
>Priority: Major
>  Labels: CurrencyDimension, birt, currency, dimension, dwh
> Attachments: OFBIZ-10953-BI.patch
>
>
> Currently the record sequencer (delegator.getNextSeqId) is used to determine 
> the dimensionId for the CurrencyDimension. This is unnecessary as the uomId 
> from the UOM table can be used for currency.
> It also makes it easier to set the foreign-key in fact tables by generating 
> it based on the date provided, than by retrieving the dimensionId based on a 
> retrieval through the getDimensionIdFromNaturalKey service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OFBIZ-10953) have CurrencyDimension have a dimensionId that is based on the natural key

2020-02-06 Thread Pierre Smits (Jira)


[ 
https://issues.apache.org/jira/browse/OFBIZ-10953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031527#comment-17031527
 ] 

Pierre Smits commented on OFBIZ-10953:
--

Currency definitions are not defined by the project, but rather by the ISO (see 
1). And based on publicly available information(this may take some time to 
find) the meaning of a currency code does not change over time. As you can see 
in the referred webpage there have been several updates to the currency of 
Zimbabwe (formerly known as Rodesia), where each new currency got a new code 
while the name stayed the same. 

We can therefore safely assume/expect that over the lifespan of a 
country-currency relationship the currency stays unique. We thus don't need to 
worry about multiple records (both in the transaction database in the Uom 
table) as in the dwh in the currency dimensionTable) with duplicate natural 
keys. As a fact: this is not possible because the natural key can not exist 
more than once in the source table (UoM).

In DWH and BI circles it is very common to use the natural key in dimension and 
fact table for currencies (as opposed to the business-own generated sequencedId 
for other transaction tables), as it reduces the need to have additional effort 
to set up - in star schemas and fact tables  - the relation between the 
(sequenced) dimensionId and what the common meaning is (the natural key or 
currency code).

Even with regards to the adopted load/update strategies (type 1, type 2, etc) 
the DWH toolkit book  does not dictate that values in the dimensionId field of 
dimension tables must be of the sequence type.

 

 

[1] https://en.wikipedia.org/wiki/ISO_4217

> have CurrencyDimension have a dimensionId that is based on the natural key
> --
>
> Key: OFBIZ-10953
> URL: https://issues.apache.org/jira/browse/OFBIZ-10953
> Project: OFBiz
>  Issue Type: Improvement
>  Components: bi
>Affects Versions: Trunk, Release Branch 17.12, Release Branch 18.12
>Reporter: Pierre Smits
>Assignee: Pierre Smits
>Priority: Major
>  Labels: CurrencyDimension, birt, currency, dimension, dwh
> Attachments: OFBIZ-10953-BI.patch
>
>
> Currently the record sequencer (delegator.getNextSeqId) is used to determine 
> the dimensionId for the CurrencyDimension. This is unnecessary as the uomId 
> from the UOM table can be used for currency.
> It also makes it easier to set the foreign-key in fact tables by generating 
> it based on the date provided, than by retrieving the dimensionId based on a 
> retrieval through the getDimensionIdFromNaturalKey service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OFBIZ-10953) have CurrencyDimension have a dimensionId that is based on the natural key

2020-02-06 Thread Jacopo Cappellato (Jira)


[ 
https://issues.apache.org/jira/browse/OFBIZ-10953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031495#comment-17031495
 ] 

Jacopo Cappellato commented on OFBIZ-10953:
---

Unfortunately it is not desirable to use uomId (i.e. the natural key) to 
populate the dimensionId (i.e. the primary key of the dimension table): in 
fact, based on the update strategy adopted for the dimension table (type 1, 
type 2 etc... as described in "The Datawarehouse Toolkit" book) the 
CurrencyDimension table may contain several rows with the same uomId as this is 
important for properly implementing drill-up and drill-down reports.

> have CurrencyDimension have a dimensionId that is based on the natural key
> --
>
> Key: OFBIZ-10953
> URL: https://issues.apache.org/jira/browse/OFBIZ-10953
> Project: OFBiz
>  Issue Type: Improvement
>  Components: bi
>Affects Versions: Trunk, Release Branch 17.12, Release Branch 18.12
>Reporter: Pierre Smits
>Assignee: Pierre Smits
>Priority: Major
>  Labels: CurrencyDimension, birt, currency, dimension, dwh
> Attachments: OFBIZ-10953-BI.patch
>
>
> Currently the record sequencer (delegator.getNextSeqId) is used to determine 
> the dimensionId for the CurrencyDimension. This is unnecessary as the uomId 
> from the UOM table can be used for currency.
> It also makes it easier to set the foreign-key in fact tables by generating 
> it based on the date provided, than by retrieving the dimensionId based on a 
> retrieval through the getDimensionIdFromNaturalKey service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)