Re: Validation of genealogy data

Stephen Woodbridge Sat, 06 Aug 2011 19:50:13 -0700

Hi Ron,

I think you have summed this up correctly. Using a UID or UUID andcollecting a stream of them probably does the job. I think that onesubtle point is that given an object identified by a UID it can changeover time as information is added to it. I think version tracking on theobject is a good idea. That would allow you to merge and object at someversion and later deal with updating that object from its source to anew version.

Overall, it sounds like you have the ideas and it sounds like it couldbe a very enabling tool.


Thanks,
  -Steve

On 8/6/2011 7:30 PM, Ron Savage wrote:

Hi Steve

On Sat, 2011-08-06 at 11:32 -0400, Stephen Woodbridge wrote:

On 8/6/2011 2:28 AM, Ron Savage wrote:

Hi Steve

On Sat, 2011-08-06 at 00:39 -0400, Stephen Woodbridge wrote:

On 8/5/2011 9:04 PM, Darren Duncan wrote:

Ron Savage wrote:

http://swoodbridge.com/family/Woodbridge/index.php?indi=I2921

I keep all the data in Family Tree Maker, export that to a GEDCOM, then
load it use a Gedcom.pm script into Postgresql database and serve the
pages via php. The photos are integrated by a separate web app that
allows loading, editing and linking them to the genealogy tables in the
database.

I really big requirement is persistent IDs for individuals. I have to be
very careful to not do anything that might renumber them.


Noted.

Is there some specific action with programs we've mentioned which does
renumber them?


Well the obvious one is a renumber command ;), but merging files and
merge individuals some times creates a new individual and then copies
the data from the two merged ones into the new which causes the new one
to be a new number.


OK. This is what I wanted spelled out. Saves me having to make baseless
assumptions :-).

Yes, I see the difficulty. Thinking aloud...

Let's say we try to solve this by giving each INDI a unique # (UID)
besides the # in the INDI statement itself.

Renumbering changes INDI # but not UID.

When combining data from 2 parties, INDI can be set to anything, but we
haven't solved the problem, since the question now is: Which UID is
definitive? A: Neither. We've gained nothing. Right? But read on...

Or is it enough to record a trail of UIDs, so a set of UIDs can be
attached to the final INDI? This allows backtracking from the combined
data to the 2 source data sets (by ignoring the actual value of INDI,
and working off the UID). Would that suffice?

No. See below.

But from a more general point of view and talking about versioning of
data, if 100 people create an INDI record for the same person in
separate research projects and later some of them merge their research
at various points in time it would be nice to know if my INDI includes
one or more of those other INDIs and it might be nice to know at what
version of those INDI(s) got merged into my work.


I think this is just the above, extended such that each assertion about
an individual, not just each individual, owns a set of UIDs. Make sense?

I suppose one way of thinking about this would be like SVN or GIT source
code respository, where files were INDIs or FACTs and there exists a
link like item the connects facts to INDIs or "LINK"s and INDIs to other
INDIs. I'm not suggesting this as a technical design but as a way of
thinking about the problem of revisioning and history.


It might be tempting to use git, but I think not:

o With git (etc) the end result is, for each assertion, 1 definitive
value (after a merge), but with a history managed by git to enable
tracing of where that value came from, i.e. what the alternative values
were at the time(s) of the merge(s).

o My feeling from the discussion so far is that what's wanted is to
carry forward all versions of the assertion, in parallel so to speak.

I really think this means a set of UIDs per assertion, with the UIDs'
purposes being pointers back in time to the multiple sources leading to
the 'current' state.

Re: Validation of genealogy data

Reply via email to