Hi Steve

On 06/11/12 11:23, Stephen Woodbridge wrote:
Ron,

I think this is a graph not a tree or at best an interconnect forest of
trees. Given a focus node like an individual or a family you can view
look at the trees up or down from that node.

In graph theory you have nodes and edges, and you can use Dijkstra's
shortest path to find the shortest route through the graph between the
start and end node. It does this by converting the graph into a tree,
but doing so does not maintain all the node because many of them are
parallel paths. You can do the same thing with a gedcom file where nodes
are individuals and families and the relationships are the edges.

Nodes and edges for sure. Tree versus graph, probably better to think of
it as a graph. Somethings you need to be able to model are:

IVF as you mentioned.
divorce/re-marriage/adoption/... again as you mentioned.
marriage and offspring of relations.

If it is a graph, you can use graph theory to process it and these are
well defined algorithms for traversal and manipulation of graphs.

Agreed.

In the module I was planning to use, Tree::DAG_Node, DAG is in fact Directed Acyclic Graph :-).

Of course it may turn out to not be the best choice in these circumstances.

I'm also trying somewhat to avoid graph theory terminology at this stage, since it's all hand-waving so far.

Further reading for interested parties:

https://en.wikipedia.org/wiki/Graph_theory


-Steve

On 11/5/2012 4:36 PM, Ron Savage wrote:
Hi Steve

On 06/11/12 00:54, Stephen Woodbridge wrote:
Hi Ron,

This all sounds great. I have a question on your choice of using a tree
structure, can you explain that more? Are you thinking of the file being
the root, then having leaves like: indi, fams, famc, etc and then each
of these have their respective data hanging off those objects? Or are
you thinking the tree would represent the family relationships? I don't
see how the later will work.

I got the idea from the nested structure in the GEDCOM doc itself. Any
nesting in the doc could be represented by children of a node in a tree.

But frankly, I have not thought it through. I really mentioned it to
help clarify my thoughts, and I strongly suspect actual coding will
modify my plan.

But in a bit more detail: An individual can be seen as a node in a tree,
in which case:

o They have a list of (2) grandparents (IVF aside!)

o They have a list of (N) spouses

o They have a list of (N) children

o (As a child) They have a list of (N) care-givers

o (As a patient) They have a list of (N) donors or organs or whatever

And fundamentally, in a tree, every node has N links (normally 1 up, N
sideways and N down), and those links have metadata which would be used
to represent the type of link.

The most obvious problem with a tree is the cross-links due to
divorce/re-marriage/adoption/...

Still, whatever the data structure chosen, /something/ has to be chosen
just to hold the data in memory.

-Steve

On 11/5/2012 2:04 AM, Ron Savage wrote:
Hi

The new GEDCOM parser
This document is a collection of ideas which have been percolating
in my mind for a long time.

Comments welcome.

Ideas
Module name
Genealogy::Gedcom::Parser.

A place-holder, Genealogy::Gedcom
<http://metacpan.org/release/Genealogy-Gedcom>, is already on CPAN.

Note: This module was written before the new, major tools now
available were released. See Tools below.

ETA
There is no ETA for the parser.

However, certain Perl-based tools are now available which will make
coding a simple task. See Tools below.

See also 'Famous Last Words' :-).

UTF-8
The code will accept input files in utf-8, and generate files
containing utf-8 characters.

Apache and mod_perl
These will not be required. I only mention these because references
to them appear in the Gedcom.pm distro.

Logging
The code will have a built-in logger, so debugging, e.g., can be
turned on with a parameter to new().

This logger will use Log::Handler. See Tools below.

Sub-classing
Sub-classing the main module will be trivial, and samples will be
provided.

Sub-classing will be done with Hash::FieldHash. See Tools and the
FAQ below.

Grammars and grammar generators
Like Gedcom.pm, the code will read a GEDCOM grammar in BNF from a
file.
I'll run this phase before shipping the module, so you don't have to.
See Tools below, specifically Marpa::Rules::Simple.

Bascially, this means the startling complexity of the code in
Gedcom.pm is a thing of the past.

Operating the parser
Using Marpa, callbacks are triggered when input is recognized.

So, when lines like these are encountered:

1 @<XREF:FAM>@ FAM
2 RIN <AUTOMATED_RECORD_ID>

Marpa will automatically call the callback attached to each tag.

Callbacks will probably have names like 'do_fam' and 'do_rin', i.e.
of the format 'do_$tag'.

The parameters passed to the callback include the non-tag text on
the line.

Default callbacks for all tags will be provided, each one doing its
part in parsing the parameters to the tag, and storing the result.

The result will probably be stored in a tree. See Tools below,
specifically Tree::DAG_Node.

Database support
A DBD::SQLite database is possible.

Tools
o Hash::FieldHash
Simplifies class-building.

As for the obvious question, why not use Moose, see the FAQ below.

o Log::Handling
Simplifies logging.

o Marpa::R2
This is the modern way to do parsing.

Home page <http://jeffreykegler.github.com/Marpa-web-site/>.

Jeffrey's blog about Marpa
<http://jeffreykegler.github.com/Ocean-of-Awareness-blog/>.

My recent article about lexing and parsing with Marpa

<http://www.perl.com/pub/2012/10/an-overview-of-lexing-and-parsing.html>.



o MarpaX::Simple::Rules
This module reads a grammar in BNF and generates a Marpa grammar.

Hence it will read a BNF version of the GEDCOM spec and output
the matching Marpa grammar.

o Tree::DAG_Node
The most sophisticated tree-handling code on CPAN. I've recently
become co-maintainer of this module.

FAQ
Why did you choose Hash::FieldHash over Moose?
My policy is to use the light-weight Hash::FieldHash for stand-alone
modules and Moose for applications.

Why did you choose to store the data in a tree?
A GEDCOM file's structure can be viewed as a tree, so my initial
plan is to store the data likewise.












--
Ron Savage
http://savage.net.au/
Ph: 0421 920 622

Reply via email to