Re: The new GEDCOM parser

2012-11-06 Thread Ron Savage

Hi Steve

On 06/11/12 16:11, Stephen Woodbridge wrote:

Hi Ron,

I work with graphs for doing vehicle routing so have some familiarity
with them.


Good.


I think this is the family of graph tool to look at using:

http://search.cpan.org/~jhi/Graph/


Yes, that's the one I earlier said /I/ had trouble with. See a prevous msg.

Perhaps I will need to adopt it despite my recent experience!


and I think these work with:

Graph::Writer
Graph::Reader


Yes they do. The author of Graph recommends them.


And it is likely that this can trivially be integrated with graph
rendering tools like GraphViz and/or one of the other tools using this:

http://search.cpan.org/~neilb/Graph-ReadWrite-2.03/lib/Graph/Writer/Dot.pm

I also found a couple of modules that might be interesting to play with:

Graph::Similarity
Graph::Matching


Clearly work has been done to write add-ons for Graph.


If these can be applied to the task matching and merging overlaping
gedcom files.


Interesting idea...


And this module can be used to save and restore Graph structures in
relational databases.


I did not check that deeply into their capabilities.


Ok, this is starting to look very interesting. I guess it all starts
with being able to move data to/from Graph structures and GEDCOM files.

Ron, you got this coded yet? ;)


No. I seem to be spending way too much time answering emails. Hahahaha.

Perhaps the familiarity you mention above will provide you with the 
wherewithal to beat me to it :-)). I look forward to your coming upload 
to CPAN



-Steve

On 11/5/2012 10:21 PM, Ron Savage wrote:

Hi John

On 06/11/12 12:32, John Washburn wrote:

I agree with Mr. Woodbridge. A directed graph is a better model.


I aggree. Conceptually it must be graph.

Sure. The question is: Is there a module on CPAN which will do the job?

I was a bit careless with the terminology. One issue to do with this
which I have not previously stated is that there is a Graph module on
CPAN: https://metacpan.org/release/Graph

I had a little play with it, and the results were incomprehensible.

Another issue is that the docs are a bit terse, and are (reasonably)
aimed at experts in the field. The author does refer to 'my fiendish
code'.

I'm not an expert on graphs, and find the docs too difficult to follow
to make this module a possible contender.

So I still have the problem as to which CPAN module if any I adopt. I
have no intention to rewrite Graph, so I picked Tree::DAG_Node as a
first choice, not implying it's the best or most appropriate.

Solution: Don't know.

More below...


Consider adoption where the adoptee knows their biological parents. This
individual has four parents two adoptive and two biological; or more
precisely two mothers and two fathers.

This does not fit well into a Tree (with 1 up) but is no problem for
the a
directed graph between these 5 individuals. Some are connected by an
edge
named birth and some by an edge called adoption. A directed graph
handles the extension of this situation where the adoptive parents
have a
biological child as well and/or the biological parent has other
children not
put up for adoption or adopted by another family. A strict tree
structure
is at best messy for this situation.

A graph traversal though can report that this person is my brother by
adoption not blood and that this other person is my sister by blood only
since we were adopted by different families.

The problem you will encounter is to take this nuanced, graph
structure and
squashing it down into a GEDCOM tree which has a design biased toward
bloodlines over other human connections which people cherish. The
directed
graph lets model the human connections and ignore this bias until it
is time
to export the data or create the report.

I could even see the edge having a truth value in the closed interval:
[0..1]. For example: I am 70% sure this Percilla Chase is my ancestor
and
30% that this Prissy Chase born in the same year one town over is my
ancestor. The edge connecting my ancestor has two sets of birth
edges; one
for the 70% connection and one set for the 30% connection. The only
one up
nature of a tree makes such uncertain/tentative connections difficult to
model.


Weighting factors on edges are definitely nice-to-have, and I would do
nothing to preempt their implementation.


-Original Message-
From: Stephen Woodbridge [mailto:wood...@swoodbridge.com]
Sent: Monday, November 05, 2012 6:23 PM
To: perl-gedcom@perl.org
Subject: Re: The new GEDCOM parser

Ron,

I think this is a graph not a tree or at best an interconnect forest of
trees. Given a focus node like an individual or a family you can view
look
at the trees up or down from that node.

In graph theory you have nodes and edges, and you can use Dijkstra's
shortest path to find the shortest route through the graph between the
start
and end node. It does this by converting the graph into a tree, but
doing so
does not maintain all the node because many of them are parallel

The Gedgrave Project

2012-11-06 Thread Nigel Horne

I'm putting together a website where you can upload a Gedcom
(temporarily, it won't be stored) and the site will then
give you links to the burial sites of your ancestors (where known).  It
does this by using the Gedcom CPAN module to parse the data, then it
interrogates sites such as billiongraves.com,
findagrave.com and tombfinder.com.  I have written the first draft
version of the engine which does the matching and am looking for Gedcom
data to test it with.  You can upload data at 
http://nigelhorne.force9.co.uk/~njh,
or if you prefer you can e-mail a Gedcom to me.

Yes - the code will be open sourced (probably on Github) once I've put enough 
data through it to test it.

Thanks,

-Nigel Horne

--
Arranger, Adjudicator, Band Trainer and Clinician, Composer, Tutor, Typesetter.
NJH Music, ICQ#20252325, twitter: @nigelhorne @bbportal @brasscomposer
n...@bandsman.co.uk http://www.bandsman.co.uk



Re: The new GEDCOM parser

2012-11-06 Thread Stephen Woodbridge

Hi Ron,

I totally agree with Jeremy, that the parser is the key. Obviously you 
need to supply it with test callbacks for verification as you build it. 
But once you have that then it should be straight forward for people (or 
you) to hook it into say Graph, or load it into a database or whatever.


I think having a reader and writer the conform to a grammar is the place 
to start.


-Steve

On 11/6/2012 4:49 AM, Ron Savage wrote:

Hi Steve

On 06/11/12 16:11, Stephen Woodbridge wrote:

Hi Ron,

I work with graphs for doing vehicle routing so have some familiarity
with them.


Good.


I think this is the family of graph tool to look at using:

http://search.cpan.org/~jhi/Graph/


Yes, that's the one I earlier said /I/ had trouble with. See a prevous msg.

Perhaps I will need to adopt it despite my recent experience!


and I think these work with:

Graph::Writer
Graph::Reader


Yes they do. The author of Graph recommends them.


And it is likely that this can trivially be integrated with graph
rendering tools like GraphViz and/or one of the other tools using this:

http://search.cpan.org/~neilb/Graph-ReadWrite-2.03/lib/Graph/Writer/Dot.pm


I also found a couple of modules that might be interesting to play with:

Graph::Similarity
Graph::Matching


Clearly work has been done to write add-ons for Graph.


If these can be applied to the task matching and merging overlaping
gedcom files.


Interesting idea...


And this module can be used to save and restore Graph structures in
relational databases.


I did not check that deeply into their capabilities.


Ok, this is starting to look very interesting. I guess it all starts
with being able to move data to/from Graph structures and GEDCOM files.

Ron, you got this coded yet? ;)


No. I seem to be spending way too much time answering emails. Hahahaha.

Perhaps the familiarity you mention above will provide you with the
wherewithal to beat me to it :-)). I look forward to your coming upload
to CPAN


-Steve

On 11/5/2012 10:21 PM, Ron Savage wrote:

Hi John

On 06/11/12 12:32, John Washburn wrote:

I agree with Mr. Woodbridge. A directed graph is a better model.


I aggree. Conceptually it must be graph.

Sure. The question is: Is there a module on CPAN which will do the job?

I was a bit careless with the terminology. One issue to do with this
which I have not previously stated is that there is a Graph module on
CPAN: https://metacpan.org/release/Graph

I had a little play with it, and the results were incomprehensible.

Another issue is that the docs are a bit terse, and are (reasonably)
aimed at experts in the field. The author does refer to 'my fiendish
code'.

I'm not an expert on graphs, and find the docs too difficult to follow
to make this module a possible contender.

So I still have the problem as to which CPAN module if any I adopt. I
have no intention to rewrite Graph, so I picked Tree::DAG_Node as a
first choice, not implying it's the best or most appropriate.

Solution: Don't know.

More below...


Consider adoption where the adoptee knows their biological parents.
This
individual has four parents two adoptive and two biological; or more
precisely two mothers and two fathers.

This does not fit well into a Tree (with 1 up) but is no problem for
the a
directed graph between these 5 individuals. Some are connected by an
edge
named birth and some by an edge called adoption. A directed graph
handles the extension of this situation where the adoptive parents
have a
biological child as well and/or the biological parent has other
children not
put up for adoption or adopted by another family. A strict tree
structure
is at best messy for this situation.

A graph traversal though can report that this person is my brother by
adoption not blood and that this other person is my sister by blood
only
since we were adopted by different families.

The problem you will encounter is to take this nuanced, graph
structure and
squashing it down into a GEDCOM tree which has a design biased toward
bloodlines over other human connections which people cherish. The
directed
graph lets model the human connections and ignore this bias until it
is time
to export the data or create the report.

I could even see the edge having a truth value in the closed interval:
[0..1]. For example: I am 70% sure this Percilla Chase is my ancestor
and
30% that this Prissy Chase born in the same year one town over is my
ancestor. The edge connecting my ancestor has two sets of birth
edges; one
for the 70% connection and one set for the 30% connection. The only
one up
nature of a tree makes such uncertain/tentative connections
difficult to
model.


Weighting factors on edges are definitely nice-to-have, and I would do
nothing to preempt their implementation.


-Original Message-
From: Stephen Woodbridge [mailto:wood...@swoodbridge.com]
Sent: Monday, November 05, 2012 6:23 PM
To: perl-gedcom@perl.org
Subject: Re: The new GEDCOM parser

Ron,

I think this is a graph not a tree or at 

Re: The new GEDCOM parser

2012-11-06 Thread Chris Clonch

Hi everybody!

With a graph theory connection, you could easily implement Randy 
Wilson's ideas [1] on merging GEDCOMs.  I've attempted to give thought 
to how this could be implemented with Graph.pm (as seen in O'Reilly's 
Algorithms with Perl)  Gedcom.pm but I couldn't wrap my brain around 
the ins and outs of the data as a graph.


Like Jeremy stated, if a solid parser is written, then it would likely 
be trivial (for others :) to extend using graph, tree, whatever for 
advanced manipulation of the data.  And the reverse could be true, 
additional lexer's could be wrote to slurp in XML, FOAF, etc that comes 
down the road...


Got me excited again!

-Chris


1. http://synapse.cs.byu.edu/~randy/gen/Remerge.html


On 2012-11-06 00:11, Stephen Woodbridge wrote:

Hi Ron,

I work with graphs for doing vehicle routing so have some familiarity 
with them.


I think this is the family of graph tool to look at using:

http://search.cpan.org/~jhi/Graph/

and I think these work with:

Graph::Writer
Graph::Reader

And it is likely that this can trivially be integrated with graph
rendering tools like GraphViz and/or one of the other tools using
this:


http://search.cpan.org/~neilb/Graph-ReadWrite-2.03/lib/Graph/Writer/Dot.pm

I also found a couple of modules that might be interesting to play 
with:


Graph::Similarity
Graph::Matching

If these can be applied to the task matching and merging overlaping
gedcom files.

And this module can be used to save and restore Graph structures in
relational databases.

Ok, this is starting to look very interesting. I guess it all starts
with being able to move data to/from Graph structures and GEDCOM
files.

Ron, you got this coded yet? ;)

-Steve

On 11/5/2012 10:21 PM, Ron Savage wrote:

Hi John

On 06/11/12 12:32, John Washburn wrote:

I agree with Mr. Woodbridge.  A directed graph is a better model.


I aggree. Conceptually it must be graph.

Sure. The question is: Is there a module on CPAN which will do the 
job?


I was a bit careless with the terminology. One issue to do with this
which I have not previously stated is that there is a Graph module 
on

CPAN: https://metacpan.org/release/Graph

I had a little play with it, and the results were incomprehensible.

Another issue is that the docs are a bit terse, and are (reasonably)
aimed at experts in the field. The author does refer to 'my fiendish 
code'.


I'm not an expert on graphs, and find the docs too difficult to 
follow

to make this module a possible contender.

So I still have the problem as to which CPAN module if any I adopt. 
I

have no intention to rewrite Graph, so I picked Tree::DAG_Node as a
first choice, not implying it's the best or most appropriate.

Solution: Don't know.

More below...

Consider adoption where the adoptee knows their biological parents. 
This
individual has four parents two adoptive and two biological; or 
more

precisely two mothers and two fathers.

This does not fit well into a Tree (with 1 up) but is no problem 
for

the a
directed graph between these 5 individuals. Some are connected by 
an edge
named birth and some by an edge called adoption.  A directed 
graph

handles the extension of  this situation where the adoptive parents
have a
biological child as well and/or the biological parent has other
children not
put up for adoption or adopted by another family.  A strict tree
structure
is at best messy for this situation.

A graph traversal though can report that this person is my brother 
by
adoption not blood and that this other person is my sister by blood 
only

since we were adopted by different families.

The problem you will encounter is to take this nuanced, graph
structure and
squashing it down into a GEDCOM tree which has a design biased 
toward

bloodlines over other human connections which people cherish.  The
directed
graph lets model the human connections and ignore this bias until 
it

is time
to export the data or create the report.

I could even see the edge having a truth value in the closed 
interval:
[0..1].  For example: I am 70% sure this Percilla Chase is my 
ancestor

and
30% that this Prissy Chase born in the same year one town over is 
my

ancestor.  The edge connecting my ancestor has two sets of birth
edges; one
for the 70% connection and one set for the 30% connection.  The 
only

one up
nature of a tree makes such uncertain/tentative connections 
difficult to

model.


Weighting factors on edges are definitely nice-to-have, and I would 
do

nothing to preempt their implementation.


-Original Message-
From: Stephen Woodbridge [mailto:wood...@swoodbridge.com]
Sent: Monday, November 05, 2012 6:23 PM
To: perl-gedcom@perl.org
Subject: Re: The new GEDCOM parser

Ron,

I think this is a graph not a tree or at best an interconnect 
forest of
trees. Given a focus node like an individual or a family you can 
view

look
at the trees up or down from that node.

In graph theory you have nodes and edges, and you can use 
Dijkstra's
shortest path to find 

Re: The new GEDCOM parser

2012-11-06 Thread Ron Savage

Hi Chris

On 07/11/12 06:27, Chris Clonch wrote:

Hi everybody!

With a graph theory connection, you could easily implement Randy
Wilson's ideas [1] on merging GEDCOMs. I've attempted to give thought to
how this could be implemented with Graph.pm (as seen in O'Reilly's
Algorithms with Perl)  Gedcom.pm but I couldn't wrap my brain around
the ins and outs of the data as a graph.


Guess I'll have to reassess the appropriateness of Graph.pm. I may well 
have to use it.



Like Jeremy stated, if a solid parser is written, then it would likely
be trivial (for others :) to extend using graph, tree, whatever for
advanced manipulation of the data. And the reverse could be true,
additional lexer's could be wrote to slurp in XML, FOAF, etc that comes
down the road...

Got me excited again!


Good!


1. http://synapse.cs.byu.edu/~randy/gen/Remerge.html


Thanx for the reference.

--
Ron Savage
http://savage.net.au/
Ph: 0421 920 622