Re: The new GEDCOM parser
Hi Steve On 06/11/12 16:11, Stephen Woodbridge wrote: Hi Ron, I work with graphs for doing vehicle routing so have some familiarity with them. Good. I think this is the family of graph tool to look at using: http://search.cpan.org/~jhi/Graph/ Yes, that's the one I earlier said /I/ had trouble with. See a prevous msg. Perhaps I will need to adopt it despite my recent experience! and I think these work with: Graph::Writer Graph::Reader Yes they do. The author of Graph recommends them. And it is likely that this can trivially be integrated with graph rendering tools like GraphViz and/or one of the other tools using this: http://search.cpan.org/~neilb/Graph-ReadWrite-2.03/lib/Graph/Writer/Dot.pm I also found a couple of modules that might be interesting to play with: Graph::Similarity Graph::Matching Clearly work has been done to write add-ons for Graph. If these can be applied to the task matching and merging overlaping gedcom files. Interesting idea... And this module can be used to save and restore Graph structures in relational databases. I did not check that deeply into their capabilities. Ok, this is starting to look very interesting. I guess it all starts with being able to move data to/from Graph structures and GEDCOM files. Ron, you got this coded yet? ;) No. I seem to be spending way too much time answering emails. Hahahaha. Perhaps the familiarity you mention above will provide you with the wherewithal to beat me to it :-)). I look forward to your coming upload to CPAN -Steve On 11/5/2012 10:21 PM, Ron Savage wrote: Hi John On 06/11/12 12:32, John Washburn wrote: I agree with Mr. Woodbridge. A directed graph is a better model. I aggree. Conceptually it must be graph. Sure. The question is: Is there a module on CPAN which will do the job? I was a bit careless with the terminology. One issue to do with this which I have not previously stated is that there is a Graph module on CPAN: https://metacpan.org/release/Graph I had a little play with it, and the results were incomprehensible. Another issue is that the docs are a bit terse, and are (reasonably) aimed at experts in the field. The author does refer to 'my fiendish code'. I'm not an expert on graphs, and find the docs too difficult to follow to make this module a possible contender. So I still have the problem as to which CPAN module if any I adopt. I have no intention to rewrite Graph, so I picked Tree::DAG_Node as a first choice, not implying it's the best or most appropriate. Solution: Don't know. More below... Consider adoption where the adoptee knows their biological parents. This individual has four parents two adoptive and two biological; or more precisely two mothers and two fathers. This does not fit well into a Tree (with 1 up) but is no problem for the a directed graph between these 5 individuals. Some are connected by an edge named birth and some by an edge called adoption. A directed graph handles the extension of this situation where the adoptive parents have a biological child as well and/or the biological parent has other children not put up for adoption or adopted by another family. A strict tree structure is at best messy for this situation. A graph traversal though can report that this person is my brother by adoption not blood and that this other person is my sister by blood only since we were adopted by different families. The problem you will encounter is to take this nuanced, graph structure and squashing it down into a GEDCOM tree which has a design biased toward bloodlines over other human connections which people cherish. The directed graph lets model the human connections and ignore this bias until it is time to export the data or create the report. I could even see the edge having a truth value in the closed interval: [0..1]. For example: I am 70% sure this Percilla Chase is my ancestor and 30% that this Prissy Chase born in the same year one town over is my ancestor. The edge connecting my ancestor has two sets of birth edges; one for the 70% connection and one set for the 30% connection. The only one up nature of a tree makes such uncertain/tentative connections difficult to model. Weighting factors on edges are definitely nice-to-have, and I would do nothing to preempt their implementation. -Original Message- From: Stephen Woodbridge [mailto:wood...@swoodbridge.com] Sent: Monday, November 05, 2012 6:23 PM To: perl-gedcom@perl.org Subject: Re: The new GEDCOM parser Ron, I think this is a graph not a tree or at best an interconnect forest of trees. Given a focus node like an individual or a family you can view look at the trees up or down from that node. In graph theory you have nodes and edges, and you can use Dijkstra's shortest path to find the shortest route through the graph between the start and end node. It does this by converting the graph into a tree, but doing so does not maintain all the node because many of them are parallel
The Gedgrave Project
I'm putting together a website where you can upload a Gedcom (temporarily, it won't be stored) and the site will then give you links to the burial sites of your ancestors (where known). It does this by using the Gedcom CPAN module to parse the data, then it interrogates sites such as billiongraves.com, findagrave.com and tombfinder.com. I have written the first draft version of the engine which does the matching and am looking for Gedcom data to test it with. You can upload data at http://nigelhorne.force9.co.uk/~njh, or if you prefer you can e-mail a Gedcom to me. Yes - the code will be open sourced (probably on Github) once I've put enough data through it to test it. Thanks, -Nigel Horne -- Arranger, Adjudicator, Band Trainer and Clinician, Composer, Tutor, Typesetter. NJH Music, ICQ#20252325, twitter: @nigelhorne @bbportal @brasscomposer n...@bandsman.co.uk http://www.bandsman.co.uk
Re: The new GEDCOM parser
Hi Ron, I totally agree with Jeremy, that the parser is the key. Obviously you need to supply it with test callbacks for verification as you build it. But once you have that then it should be straight forward for people (or you) to hook it into say Graph, or load it into a database or whatever. I think having a reader and writer the conform to a grammar is the place to start. -Steve On 11/6/2012 4:49 AM, Ron Savage wrote: Hi Steve On 06/11/12 16:11, Stephen Woodbridge wrote: Hi Ron, I work with graphs for doing vehicle routing so have some familiarity with them. Good. I think this is the family of graph tool to look at using: http://search.cpan.org/~jhi/Graph/ Yes, that's the one I earlier said /I/ had trouble with. See a prevous msg. Perhaps I will need to adopt it despite my recent experience! and I think these work with: Graph::Writer Graph::Reader Yes they do. The author of Graph recommends them. And it is likely that this can trivially be integrated with graph rendering tools like GraphViz and/or one of the other tools using this: http://search.cpan.org/~neilb/Graph-ReadWrite-2.03/lib/Graph/Writer/Dot.pm I also found a couple of modules that might be interesting to play with: Graph::Similarity Graph::Matching Clearly work has been done to write add-ons for Graph. If these can be applied to the task matching and merging overlaping gedcom files. Interesting idea... And this module can be used to save and restore Graph structures in relational databases. I did not check that deeply into their capabilities. Ok, this is starting to look very interesting. I guess it all starts with being able to move data to/from Graph structures and GEDCOM files. Ron, you got this coded yet? ;) No. I seem to be spending way too much time answering emails. Hahahaha. Perhaps the familiarity you mention above will provide you with the wherewithal to beat me to it :-)). I look forward to your coming upload to CPAN -Steve On 11/5/2012 10:21 PM, Ron Savage wrote: Hi John On 06/11/12 12:32, John Washburn wrote: I agree with Mr. Woodbridge. A directed graph is a better model. I aggree. Conceptually it must be graph. Sure. The question is: Is there a module on CPAN which will do the job? I was a bit careless with the terminology. One issue to do with this which I have not previously stated is that there is a Graph module on CPAN: https://metacpan.org/release/Graph I had a little play with it, and the results were incomprehensible. Another issue is that the docs are a bit terse, and are (reasonably) aimed at experts in the field. The author does refer to 'my fiendish code'. I'm not an expert on graphs, and find the docs too difficult to follow to make this module a possible contender. So I still have the problem as to which CPAN module if any I adopt. I have no intention to rewrite Graph, so I picked Tree::DAG_Node as a first choice, not implying it's the best or most appropriate. Solution: Don't know. More below... Consider adoption where the adoptee knows their biological parents. This individual has four parents two adoptive and two biological; or more precisely two mothers and two fathers. This does not fit well into a Tree (with 1 up) but is no problem for the a directed graph between these 5 individuals. Some are connected by an edge named birth and some by an edge called adoption. A directed graph handles the extension of this situation where the adoptive parents have a biological child as well and/or the biological parent has other children not put up for adoption or adopted by another family. A strict tree structure is at best messy for this situation. A graph traversal though can report that this person is my brother by adoption not blood and that this other person is my sister by blood only since we were adopted by different families. The problem you will encounter is to take this nuanced, graph structure and squashing it down into a GEDCOM tree which has a design biased toward bloodlines over other human connections which people cherish. The directed graph lets model the human connections and ignore this bias until it is time to export the data or create the report. I could even see the edge having a truth value in the closed interval: [0..1]. For example: I am 70% sure this Percilla Chase is my ancestor and 30% that this Prissy Chase born in the same year one town over is my ancestor. The edge connecting my ancestor has two sets of birth edges; one for the 70% connection and one set for the 30% connection. The only one up nature of a tree makes such uncertain/tentative connections difficult to model. Weighting factors on edges are definitely nice-to-have, and I would do nothing to preempt their implementation. -Original Message- From: Stephen Woodbridge [mailto:wood...@swoodbridge.com] Sent: Monday, November 05, 2012 6:23 PM To: perl-gedcom@perl.org Subject: Re: The new GEDCOM parser Ron, I think this is a graph not a tree or at
Re: The new GEDCOM parser
Hi everybody! With a graph theory connection, you could easily implement Randy Wilson's ideas [1] on merging GEDCOMs. I've attempted to give thought to how this could be implemented with Graph.pm (as seen in O'Reilly's Algorithms with Perl) Gedcom.pm but I couldn't wrap my brain around the ins and outs of the data as a graph. Like Jeremy stated, if a solid parser is written, then it would likely be trivial (for others :) to extend using graph, tree, whatever for advanced manipulation of the data. And the reverse could be true, additional lexer's could be wrote to slurp in XML, FOAF, etc that comes down the road... Got me excited again! -Chris 1. http://synapse.cs.byu.edu/~randy/gen/Remerge.html On 2012-11-06 00:11, Stephen Woodbridge wrote: Hi Ron, I work with graphs for doing vehicle routing so have some familiarity with them. I think this is the family of graph tool to look at using: http://search.cpan.org/~jhi/Graph/ and I think these work with: Graph::Writer Graph::Reader And it is likely that this can trivially be integrated with graph rendering tools like GraphViz and/or one of the other tools using this: http://search.cpan.org/~neilb/Graph-ReadWrite-2.03/lib/Graph/Writer/Dot.pm I also found a couple of modules that might be interesting to play with: Graph::Similarity Graph::Matching If these can be applied to the task matching and merging overlaping gedcom files. And this module can be used to save and restore Graph structures in relational databases. Ok, this is starting to look very interesting. I guess it all starts with being able to move data to/from Graph structures and GEDCOM files. Ron, you got this coded yet? ;) -Steve On 11/5/2012 10:21 PM, Ron Savage wrote: Hi John On 06/11/12 12:32, John Washburn wrote: I agree with Mr. Woodbridge. A directed graph is a better model. I aggree. Conceptually it must be graph. Sure. The question is: Is there a module on CPAN which will do the job? I was a bit careless with the terminology. One issue to do with this which I have not previously stated is that there is a Graph module on CPAN: https://metacpan.org/release/Graph I had a little play with it, and the results were incomprehensible. Another issue is that the docs are a bit terse, and are (reasonably) aimed at experts in the field. The author does refer to 'my fiendish code'. I'm not an expert on graphs, and find the docs too difficult to follow to make this module a possible contender. So I still have the problem as to which CPAN module if any I adopt. I have no intention to rewrite Graph, so I picked Tree::DAG_Node as a first choice, not implying it's the best or most appropriate. Solution: Don't know. More below... Consider adoption where the adoptee knows their biological parents. This individual has four parents two adoptive and two biological; or more precisely two mothers and two fathers. This does not fit well into a Tree (with 1 up) but is no problem for the a directed graph between these 5 individuals. Some are connected by an edge named birth and some by an edge called adoption. A directed graph handles the extension of this situation where the adoptive parents have a biological child as well and/or the biological parent has other children not put up for adoption or adopted by another family. A strict tree structure is at best messy for this situation. A graph traversal though can report that this person is my brother by adoption not blood and that this other person is my sister by blood only since we were adopted by different families. The problem you will encounter is to take this nuanced, graph structure and squashing it down into a GEDCOM tree which has a design biased toward bloodlines over other human connections which people cherish. The directed graph lets model the human connections and ignore this bias until it is time to export the data or create the report. I could even see the edge having a truth value in the closed interval: [0..1]. For example: I am 70% sure this Percilla Chase is my ancestor and 30% that this Prissy Chase born in the same year one town over is my ancestor. The edge connecting my ancestor has two sets of birth edges; one for the 70% connection and one set for the 30% connection. The only one up nature of a tree makes such uncertain/tentative connections difficult to model. Weighting factors on edges are definitely nice-to-have, and I would do nothing to preempt their implementation. -Original Message- From: Stephen Woodbridge [mailto:wood...@swoodbridge.com] Sent: Monday, November 05, 2012 6:23 PM To: perl-gedcom@perl.org Subject: Re: The new GEDCOM parser Ron, I think this is a graph not a tree or at best an interconnect forest of trees. Given a focus node like an individual or a family you can view look at the trees up or down from that node. In graph theory you have nodes and edges, and you can use Dijkstra's shortest path to find
Re: The new GEDCOM parser
Hi Chris On 07/11/12 06:27, Chris Clonch wrote: Hi everybody! With a graph theory connection, you could easily implement Randy Wilson's ideas [1] on merging GEDCOMs. I've attempted to give thought to how this could be implemented with Graph.pm (as seen in O'Reilly's Algorithms with Perl) Gedcom.pm but I couldn't wrap my brain around the ins and outs of the data as a graph. Guess I'll have to reassess the appropriateness of Graph.pm. I may well have to use it. Like Jeremy stated, if a solid parser is written, then it would likely be trivial (for others :) to extend using graph, tree, whatever for advanced manipulation of the data. And the reverse could be true, additional lexer's could be wrote to slurp in XML, FOAF, etc that comes down the road... Got me excited again! Good! 1. http://synapse.cs.byu.edu/~randy/gen/Remerge.html Thanx for the reference. -- Ron Savage http://savage.net.au/ Ph: 0421 920 622