Chris, David,
Wow, this is getting complicated.
2 questions.
This data import tool, can it do a data dump with foreign_key_checks turned off? I'd like to do a
data import from XML, but not have to do the graph walk.
Anybody knows of any tools to dump MySQL data (verbatim, with only platform-specific translations)
into a PostgreSQL?
I've looked into such "walk the graph/hierarchy" thing when I did my first data migration (not
even data mapping, just dumping same structure to same structure). After spending a week to
finally "get it!", I was slapped for wasting my time. Somebody did a "foreign_key_checks=0",
dumped the data into database, and set the flag back to "on". I'd agree this is a tough nut to
crack; haven't found anyone in the IT industry (other than PhDs doing database design theses?)
who'd want to bother cracking this.
About highly normalized data. Yes, I agree there are loops. In the real world, a Vendor can often
be a Customer of another Vendor, who is a Customer of the first Vendor. Ah, you get the picture
(I'm getting a headache).
Jonathon
Chris Howe wrote:
First, thank you very much for the feedback! Your comments have been
extremely helpful in me wrapping my brain around this nut. Comments
inline.
--- "David E. Jones" <[EMAIL PROTECTED]> wrote:
I'm not sure what you used to find loops in the graph but I know just
off the top of my head that there are others, and I'm guessing quite
a few others. Actually a highly normalized data model that fulfills a
large number of different requirements probably tends to cause more
loops.
I put the logic at the end for critique
Some off the of my head real quick:
- ProductCategory points to self
All of the self referential issues have been accounted for. These are
found with a relationship of entity=relEntity and therefore records
must be ordered for the pass to be successful.
- ProductStore points to Facility, Facility points to ProductStore
This is a one-nofk relationship. I assumed that this type of
relationship doesn't require maintaining referential integrity. Was
this assumption correct? If so, would this be a better relationship
type for UserLogin->Party?
Certain other ones are loops but are actually workable, like
ProductCategory(1) -> ProductCategoryRollup -> ProductCategory(2) ->
ProductCategoryRollup -> ProductCategory(1) which is okay because you
can do the ProductCategory records first and then do the rollups.
This is another one-nofk
One thing to consider is that there are certain loops that can
possibly be resolved by sorting records instead of entities. That
makes things significantly more complicated, but for certain
scenarios it's the only solution.
This is how the UserLogin -> Party would be handled assuming we don't
change the relationship to one-nofk :-)
If you really want to go for this and try to code something that will
sort both entities and records within entities to get something that
will handle most but not all cases, go for it, I won't stop you.
I'm just warning you that I looked into this once and did the
background work and found it to be a tough nut to crack, probably not
worth the effort as other solutions are easier. Also, this has been
tried before by other people who made some progress when they got
into testing with real data and such ran into problem after problem
and never found a workable solution.
Maybe the best thing for me to do is shut up and let you give it a
try. Let me know how it goes.
I wish you wouldn't. The gotcha scenarios are very helpful in the
cracking process. If the gotchas can't ungotched...then warnings and
caveats can be put into the most workable solution so that it can be
left for someone else to be able to pick it up and only work on the
uncracked portions.
-David
Two Entities:
1) EntityMap [entity, insertionPass, insertionScenario]
2) EntityRelationshipMap [primaryEntity, relatedEntity,
ignoreRelationshipFlag]
Logic process:
Setup-
1) Load all entities into EntityMaps;
2) Load all relationships of type one into EntityRelationshipMaps; set
all ignoreRelationshipFlags to "N";
Locate relationship scenario A->B->A
3) Iterate list looking for instances of primaryEntity_relatedEntity =
relatedEntity_primaryEntity. When found set ignoreRelationshipFlag to
"Y" and set EntityMap.insertionScenario for the primaryEntity and
relatedEntity to "ABA"
Locate relationship scenario A->A
4) Iterate list looking for instances where primaryEntity =
relatedEntity; set ignoreRelationshipFlag to "Y" and set
EntityMap.insertionScarion to "AA"
Determine order
5) Find all EntityMaps where insertionPass is null
6) Iterate list; find the number of times it exists in the
EntityRelationshipMap where ignoreRelationshipFlag = "N"
7) For all that return 0, set EntityMaps.insertionPass to current pass
number; find all records where the current entity exists in
EntityRelationshipMap.relatedEntity and set
EntityRelationshipMap.ignoreRelationshipFlag to "Y"
8) increment the current pass; go to 5 until no more exist.