Hi John
On 06/11/12 12:32, John Washburn wrote:
I agree with Mr. Woodbridge. A directed graph is a better model.
I aggree. Conceptually it must be graph.
Sure. The question is: Is there a module on CPAN which will do the
job?
I was a bit careless with the terminology. One issue to do with this
which I have not previously stated is that there is a Graph module
on
CPAN: https://metacpan.org/release/Graph
I had a little play with it, and the results were incomprehensible.
Another issue is that the docs are a bit terse, and are (reasonably)
aimed at experts in the field. The author does refer to 'my fiendish
code'.
I'm not an expert on graphs, and find the docs too difficult to
follow
to make this module a possible contender.
So I still have the problem as to which CPAN module if any I adopt.
I
have no intention to rewrite Graph, so I picked Tree::DAG_Node as a
first choice, not implying it's the best or most appropriate.
Solution: Don't know.
More below...
Consider adoption where the adoptee knows their biological parents.
This
individual has four "parents" two adoptive and two biological; or
more
precisely two mothers and two fathers.
This does not fit well into a Tree (with 1 up) but is no problem
for
the a
directed graph between these 5 individuals. Some are connected by
an edge
named "birth" and some by an edge called "adoption." A directed
graph
handles the extension of this situation where the adoptive parents
have a
biological child as well and/or the biological parent has other
children not
put up for adoption or adopted by another family. A strict tree
structure
is at best messy for this situation.
A graph traversal though can report that this person is my brother
by
adoption not blood and that this other person is my sister by blood
only
since we were adopted by different families.
The problem you will encounter is to take this nuanced, graph
structure and
"squashing" it down into a GEDCOM tree which has a design biased
toward
bloodlines over other human connections which people cherish. The
directed
graph lets model the human connections and ignore this bias until
it
is time
to export the data or create the report.
I could even see the edge having a truth value in the closed
interval:
[0..1]. For example: I am 70% sure this Percilla Chase is my
ancestor
and
30% that this Prissy Chase born in the same year one town over is
my
ancestor. The edge connecting my ancestor has two sets of birth
edges; one
for the 70% connection and one set for the 30% connection. The
only
one up
nature of a tree makes such uncertain/tentative connections
difficult to
model.
Weighting factors on edges are definitely nice-to-have, and I would
do
nothing to preempt their implementation.
-----Original Message-----
From: Stephen Woodbridge [mailto:wood...@swoodbridge.com]
Sent: Monday, November 05, 2012 6:23 PM
To: perl-gedcom@perl.org
Subject: Re: The new GEDCOM parser
Ron,
I think this is a graph not a tree or at best an interconnect
forest of
trees. Given a focus node like an individual or a family you can
view
look
at the trees up or down from that node.
In graph theory you have nodes and edges, and you can use
Dijkstra's
shortest path to find the shortest route through the graph between
the
start
and end node. It does this by converting the graph into a tree, but
doing so
does not maintain all the node because many of them are parallel
paths. You
can do the same thing with a gedcom file where nodes are
individuals and
families and the relationships are the edges.
Nodes and edges for sure. Tree versus graph, probably better to
think
of it
as a graph. Somethings you need to be able to model are:
IVF as you mentioned.
divorce/re-marriage/adoption/... again as you mentioned.
marriage and offspring of relations.
If it is a graph, you can use graph theory to process it and these
are
well
defined algorithms for traversal and manipulation of graphs.
-Steve
On 11/5/2012 4:36 PM, Ron Savage wrote:
Hi Steve
On 06/11/12 00:54, Stephen Woodbridge wrote:
Hi Ron,
This all sounds great. I have a question on your choice of using
a
tree structure, can you explain that more? Are you thinking of
the
file being the root, then having leaves like: indi, fams, famc,
etc
and then each of these have their respective data hanging off
those
objects? Or are you thinking the tree would represent the family
relationships? I don't see how the later will work.
I got the idea from the nested structure in the GEDCOM doc itself.
Any
nesting in the doc could be represented by children of a node in a
tree.
But frankly, I have not thought it through. I really mentioned it
to
help clarify my thoughts, and I strongly suspect actual coding
will
modify my plan.
But in a bit more detail: An individual can be seen as a node in a
tree, in which case:
o They have a list of (2) grandparents (IVF aside!)
o They have a list of (N) spouses
o They have a list of (N) children
o (As a child) They have a list of (N) care-givers
o (As a patient) They have a list of (N) donors or organs or
whatever
And fundamentally, in a tree, every node has N links (normally 1
up, N
sideways and N down), and those links have metadata which would be
used to represent the type of link.
The most obvious problem with a tree is the cross-links due to
divorce/re-marriage/adoption/...
Still, whatever the data structure chosen, /something/ has to be
chosen just to hold the data in memory.
-Steve
On 11/5/2012 2:04 AM, Ron Savage wrote:
Hi
The new GEDCOM parser
This document is a collection of ideas which have been
percolating
in my mind for a long time.
Comments welcome.
Ideas
Module name
Genealogy::Gedcom::Parser.
A place-holder, Genealogy::Gedcom
<http://metacpan.org/release/Genealogy-Gedcom>, is already on
CPAN.
Note: This module was written before the new, major tools now
available were released. See Tools below.
ETA
There is no ETA for the parser.
However, certain Perl-based tools are now available which will
make
coding a simple task. See Tools below.
See also 'Famous Last Words' :-).
UTF-8
The code will accept input files in utf-8, and generate files
containing utf-8 characters.
Apache and mod_perl
These will not be required. I only mention these because
references
to them appear in the Gedcom.pm distro.
Logging
The code will have a built-in logger, so debugging, e.g., can be
turned on with a parameter to new().
This logger will use Log::Handler. See Tools below.
Sub-classing
Sub-classing the main module will be trivial, and samples will
be
provided.
Sub-classing will be done with Hash::FieldHash. See Tools and
the
FAQ below.
Grammars and grammar generators
Like Gedcom.pm, the code will read a GEDCOM grammar in BNF from
a
file.
I'll run this phase before shipping the module, so you don't
have to.
See Tools below, specifically Marpa::Rules::Simple.
Bascially, this means the startling complexity of the code in
Gedcom.pm is a thing of the past.
Operating the parser
Using Marpa, callbacks are triggered when input is recognized.
So, when lines like these are encountered:
1 @<XREF:FAM>@ FAM
2 RIN<AUTOMATED_RECORD_ID>
Marpa will automatically call the callback attached to each tag.
Callbacks will probably have names like 'do_fam' and 'do_rin',
i.e.
of the format 'do_$tag'.
The parameters passed to the callback include the non-tag text
on
the line.
Default callbacks for all tags will be provided, each one doing
its
part in parsing the parameters to the tag, and storing the
result.
The result will probably be stored in a tree. See Tools below,
specifically Tree::DAG_Node.
Database support
A DBD::SQLite database is possible.
Tools
o Hash::FieldHash
Simplifies class-building.
As for the obvious question, why not use Moose, see the FAQ
below.
o Log::Handling
Simplifies logging.
o Marpa::R2
This is the modern way to do parsing.
Home page<http://jeffreykegler.github.com/Marpa-web-site/>.
Jeffrey's blog about Marpa
<http://jeffreykegler.github.com/Ocean-of-Awareness-blog/>.
My recent article about lexing and parsing with Marpa
<http://www.perl.com/pub/2012/10/an-overview-of-lexing-and-parsing.html>.
o MarpaX::Simple::Rules
This module reads a grammar in BNF and generates a Marpa
grammar.
Hence it will read a BNF version of the GEDCOM spec and output
the
matching Marpa grammar.
o Tree::DAG_Node
The most sophisticated tree-handling code on CPAN. I've recently
become co-maintainer of this module.
FAQ
Why did you choose Hash::FieldHash over Moose?
My policy is to use the light-weight Hash::FieldHash for
stand-alone
modules and Moose for applications.
Why did you choose to store the data in a tree?
A GEDCOM file's structure can be viewed as a tree, so my initial
plan is to store the data likewise.
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2013.0.2742 / Virus Database: 2617/5876 - Release Date:
11/05/12