Re: Data Import

David E. Jones Sun, 15 Apr 2007 09:36:03 -0700

I'm not sure what you used to find loops in the graph but I know just off the top of my head that there are others, and I'm guessing quite a few others. Actually a highly normalized data model that fulfills a large number of different requirements probably tends to cause more loops.


Some off the of my head real quick:

- ProductCategory points to self
- ProductStore points to Facility, Facility points to ProductStore

Certain other ones are loops but are actually workable, like ProductCategory(1) -> ProductCategoryRollup -> ProductCategory(2) -> ProductCategoryRollup -> ProductCategory(1) which is okay because you can do the ProductCategory records first and then do the rollups.

One thing to consider is that there are certain loops that can possibly be resolved by sorting records instead of entities. That makes things significantly more complicated, but for certain scenarios it's the only solution.

If you really want to go for this and try to code something that will sort both entities and records within entities to get something that will handle most but not all cases, go for it, I won't stop you.

I'm just warning you that I looked into this once and did the background work and found it to be a tough nut to crack, probably not worth the effort as other solutions are easier. Also, this has been tried before by other people who made some progress when they got into testing with real data and such ran into problem after problem and never found a workable solution.

Maybe the best thing for me to do is shut up and let you give it a try. Let me know how it goes.


-David


On Apr 15, 2007, at 4:22 AM, Chris Howe wrote:


--- "David E. Jones" <[EMAIL PROTECTED]> wrote:


Yeah, there are various loops: self-referencing entities (A -> A) and
mulit-entity loops (A->B->A; A->B->C->A; etc).
-David


So, I went ahead and wrote a script to walk and order the entities and
it turns out there are only two loops which are actually more like
kinks (granted it takes 17 passes to reduce the relationships to these
two loops, but it does get there ).  Knowing the order of entering

entity data that won't fail and need to be retried on subsequent passes

will more than make up for the three minutes of processing time it
takes to determine.

These should be identifiable by A-B = B-A.  The A->B->C->A loops and
greater would obviously be difficult to identify, but it doesn't
currently exist in OFBiz, so I'll assume that it's theoretical and not
likely to exist in a highly normalized generic data model.

You have the self-referencing entities (A=A) which you can avoid
referential integrity issues by walking the record hierarchy of that
entity parent->child.  These are easily identified by having both the
entity and rel-entity equal to one another.

The two restricting loops are both A->B->A
1. UserLogin->Party->UserLogin
This is caused by a denormalized(non-normalized) field Party.createdBy
and the application specific field UserLogin.partyId.

2. FinAccountTrans->Payment->FinAccountTrans

I haven't looked at the application logic, but it appears by looking at

the data model that either FinAccountTrans.paymentId or
Payment.finAccountTransId is redundant.  Judging by the rest of

FinAccountTrans, I would say that the paymentId is the one misplaced as

there is much denormalized information.  I wouldn't suspect that  this

is a heavily read area of the data model that requires denormalization.


#1 can be addressed by ordering the records or by treating as a graph
whereby creating a two column temporary join table (A__B ie
UserLogin__Party) hold the referential data, set the fK to null, load
all the records, then run an update from the temporary table to the
original entity.

#2 can be probably be addressed by fixing the logic as there are likely

1:1 relationships between the records and therefore a misplaced fk.

smime.p7s
Description: S/MIME cryptographic signature

Re: Data Import

Reply via email to