On 8/16/2011 7:14 PM, Ron Savage wrote:
Hi Steve

On Tue, 2011-08-16 at 09:48 -0400, Stephen Woodbridge wrote:
[snip]
Now, if some interface code allows the user to create INDIs, say, them
they have to be flagged as having a different UUID. Or do they?

If the original UUID belonged to the source, then yes, since the new
INDIs are coming from a different source.

I think this is the correct answer. the UUID belongs to the source of
the import action the created the data when the import did not have
UUIDs of its own.

Adding a UUID for the import action would then allow all the data to be
later purged if it needed to be so there might be value in adding a UUID
to the import even if the imported data already has UUIDs.

I think we're getting things clearer now.

To summarize: For a given db, each source which contributes records must
be separately identified by a UUID, with that UUID attached (somehow) to
each record imported.

That means various types of reports:

o Pick 2 UUIDs and process (e.g. compare, update, export, delete) just
the records belonging to those UUIDs.

o Pick 2 UUIDs and flag records such that data with (from) UUID #1 is
deemed more reliable that data with UUID # 2. Clearly both datasets are
preserved.

o Pick 1 UUID and process (e.g. update, export, delete, ...) just the
records belonging to it.

o Many others possibilities ...

Good stuff!


Hi Ron,

Yes exactly, but I not sure that it should be limited to sources because from a single source you might have some good data and some bad data. So it is fine to say this is more reliable than that, but you also need to be able to say this item is flat out wrong.

Obviously you can go crazy with this and tag everything a UUID, but here are the things I think are most important:

1. INDI, FAMI, and NOTE and individual items attached to these
2. actions performed on these or on the database

Oh! I just realized something we are mixing two different needs here

1. INDI, FAMI, SOUR and NOTE records need a UUID that does not change this is a persistent object identifier. In a given system INDI::UUID=27 should always get me the same INDI regardless.

2. for history and object version tracking so you can merge or re-merge a data set, you need a version number that gets incremented every time the object gets changed. So say I import Joe's GEDCOM and merge it with my file in January and then in August I get an update. I can ignore all the UUIDs from Joe that have the same version as in the new import and only UUIDs that I do not have or have new versions, need to be merged.

So I think we have two separate needs here that should not get merged to avoid confusion: Object need UUIDs and Actions (add, import, edit delete, etc) cause version changes to Objects. Is a version just another UUID? If an object like an INDI or INDI::BIRT is in two separate systems and is edited in both systems you would not what them to be able to have the same version number.

So a possible use case: I create an INDI in a system A and it has UUID=x and this is exported to a GEDCOM and imported into a another system B, I assume it retains it UUID=x but also has some additional information that it was imported attached to it. Now the BIRT record is added/updated separately in both systems. Later I import import the system B back into system A.

I'm just trying to think this through. There are obviously a lot of additional nuances that can be put on this, like each BIRT record could reference a SOUR record that would have a UUID and later identical or similar SOUR records could be merged keeping both UUIDs.

Maybe this is getting to be overkill? Is anyone else following this thread? I see these things as being a significant aid to managing data and merging and updating data in an automated way. But then again maybe no one else cares.

Thoughts?
  -Steve

Reply via email to