Re: A draft proposal for UUIDs

Ron Savage Wed, 17 Aug 2011 00:31:40 -0700

Hi Steve

On Tue, 2011-08-16 at 21:11 -0400, Stephen Woodbridge wrote:
> On 8/16/2011 7:14 PM, Ron Savage wrote:
> > Hi Steve
> >
> > On Tue, 2011-08-16 at 09:48 -0400, Stephen Woodbridge wrote:
> > [snip]
> >>> Now, if some interface code allows the user to create INDIs, say, them
> >>> they have to be flagged as having a different UUID. Or do they?
> >>>
> >>> If the original UUID belonged to the source, then yes, since the new
> >>> INDIs are coming from a different source.
> >>
> >> I think this is the correct answer. the UUID belongs to the source of
> >> the import action the created the data when the import did not have
> >> UUIDs of its own.
> >>
> >> Adding a UUID for the import action would then allow all the data to be
> >> later purged if it needed to be so there might be value in adding a UUID
> >> to the import even if the imported data already has UUIDs.
> >
> > I think we're getting things clearer now.
> >
> > To summarize: For a given db, each source which contributes records must
> > be separately identified by a UUID, with that UUID attached (somehow) to
> > each record imported.
> >
> > That means various types of reports:
> >
> > o Pick 2 UUIDs and process (e.g. compare, update, export, delete) just
> > the records belonging to those UUIDs.
> >
> > o Pick 2 UUIDs and flag records such that data with (from) UUID #1 is
> > deemed more reliable that data with UUID # 2. Clearly both datasets are
> > preserved.
> >
> > o Pick 1 UUID and process (e.g. update, export, delete, ...) just the
> > records belonging to it.
> >
> > o Many others possibilities ...
> >
> > Good stuff!
> >
> 
> Hi Ron,
> 
> Yes exactly, but I not sure that it should be limited to sources because 
> from a single source you might have some good data and some bad data. So 
> it is fine to say this is more reliable than that, but you also need to 
> be able to say this item is flat out wrong.
> 
> Obviously you can go crazy with this and tag everything a UUID, but here 
> are the things I think are most important:
> 
> 1. INDI, FAMI, and NOTE and individual items attached to these
> 2. actions performed on these or on the database
> 
> Oh! I just realized something we are mixing two different needs here
> 
> 1. INDI, FAMI, SOUR and NOTE records need a UUID that does not change 
> this is a persistent object identifier. In a given system INDI::UUID=27 
> should always get me the same INDI regardless.
> 
> 2. for history and object version tracking so you can merge or re-merge 
> a data set, you need a version number that gets incremented every time 
> the object gets changed. So say I import Joe's GEDCOM and merge it with 
> my file in January and then in August I get an update. I can ignore all 
> the UUIDs from Joe that have the same version as in the new import and 
> only UUIDs that I do not have or have new versions, need to be merged.
> 
> So I think we have two separate needs here that should not get merged to 
> avoid confusion: Object need UUIDs and Actions (add, import, edit 
> delete, etc) cause version changes to Objects. Is a version just another 
> UUID? If an object like an INDI or INDI::BIRT is in two separate systems 
> and is edited in both systems you would not what them to be able to have 
> the same version number.
> 
> So a possible use case: I create an INDI in a system A and it has UUID=x 
> and this is exported to a GEDCOM and imported into a another system B, I 
> assume it retains it UUID=x but also has some additional information 
> that it was imported attached to it. Now the BIRT record is 
> added/updated separately in both systems. Later I import import the 
> system B back into system A.
> 
> I'm just trying to think this through. There are obviously a lot of 
> additional nuances that can be put on this, like each BIRT record could 
> reference a SOUR record that would have a UUID and later identical or 
> similar SOUR records could be merged keeping both UUIDs.
> 
> Maybe this is getting to be overkill? Is anyone else following this 
> thread? I see these things as being a significant aid to managing data 
> and merging and updating data in an automated way. But then again maybe 
> no one else cares.


It's overkill.

We can't possibly design a mechanism for fiddling UUID in order to
emulate a version control system such as git. That's utterly futile.

So, we need to design UUIDs to serve whatever purpose people need which
can't be provided by git/etc.

(I'll probably answer your email's points separately. I still have to
think about the non-version control aspects of UUIDs :-).

-- 
Ron Savage
http://savage.net.au/
Ph: 0421 920 622

Re: A draft proposal for UUIDs

Reply via email to