Re: Validation of genealogy data

Ron Savage Sat, 06 Aug 2011 16:32:45 -0700

Hi Steve

On Sat, 2011-08-06 at 11:32 -0400, Stephen Woodbridge wrote:
> On 8/6/2011 2:28 AM, Ron Savage wrote:
> > Hi Steve
> >
> > On Sat, 2011-08-06 at 00:39 -0400, Stephen Woodbridge wrote:
> >> On 8/5/2011 9:04 PM, Darren Duncan wrote:
> >>> Ron Savage wrote:
> >
> >> http://swoodbridge.com/family/Woodbridge/index.php?indi=I2921
> >>
> >> I keep all the data in Family Tree Maker, export that to a GEDCOM, then
> >> load it use a Gedcom.pm script into Postgresql database and serve the
> >> pages via php. The photos are integrated by a separate web app that
> >> allows loading, editing and linking them to the genealogy tables in the
> >> database.
> >>
> >> I really big requirement is persistent IDs for individuals. I have to be
> >> very careful to not do anything that might renumber them.
> >
> > Noted.
> >
> > Is there some specific action with programs we've mentioned which does
> > renumber them?
> 
> Well the obvious one is a renumber command ;), but merging files and 
> merge individuals some times creates a new individual and then copies 
> the data from the two merged ones into the new which causes the new one 
> to be a new number.


OK. This is what I wanted spelled out. Saves me having to make baseless
assumptions :-).

Yes, I see the difficulty. Thinking aloud...

Let's say we try to solve this by giving each INDI a unique # (UID)
besides the # in the INDI statement itself.

Renumbering changes INDI # but not UID.

When combining data from 2 parties, INDI can be set to anything, but we
haven't solved the problem, since the question now is: Which UID is
definitive? A: Neither. We've gained nothing. Right? But read on...

Or is it enough to record a trail of UIDs, so a set of UIDs can be
attached to the final INDI? This allows backtracking from the combined
data to the 2 source data sets (by ignoring the actual value of INDI,
and working off the UID). Would that suffice?

No. See below.

> But from a more general point of view and talking about versioning of 
> data, if 100 people create an INDI record for the same person in 
> separate research projects and later some of them merge their research 
> at various points in time it would be nice to know if my INDI includes 
> one or more of those other INDIs and it might be nice to know at what 
> version of those INDI(s) got merged into my work.

I think this is just the above, extended such that each assertion about
an individual, not just each individual, owns a set of UIDs. Make sense?

> I suppose one way of thinking about this would be like SVN or GIT source 
> code respository, where files were INDIs or FACTs and there exists a 
> link like item the connects facts to INDIs or "LINK"s and INDIs to other 
> INDIs. I'm not suggesting this as a technical design but as a way of 
> thinking about the problem of revisioning and history.

It might be tempting to use git, but I think not:

o With git (etc) the end result is, for each assertion, 1 definitive
value (after a merge), but with a history managed by git to enable
tracing of where that value came from, i.e. what the alternative values
were at the time(s) of the merge(s).

o My feeling from the discussion so far is that what's wanted is to
carry forward all versions of the assertion, in parallel so to speak.

I really think this means a set of UIDs per assertion, with the UIDs'
purposes being pointers back in time to the multiple sources leading to
the 'current' state.

-- 
Ron Savage
http://savage.net.au/
Ph: 0421 920 622

Re: Validation of genealogy data

Reply via email to