Dear Stephen: I have a perl program that walks the HTML pages of e-FamilyTree.net and pipes it out to a GEDCOM file. It is also interruptible in that you can run the program for a while, stop it and upon restart it will pick up where the search left off.
It uses the HTML:Tree builder and the wonderful look-down functionality mentioned by Ron. It also used Date::Manip quite heavily. I would be happy to zip up the code and send the archive to you if you are interested. The code is a bit bloated because the code grew over time and I had to add special cases for some of the errors in the e-familynet HTML structure and to implement the 1-generation look ahead. -----Original Message----- From: Ron Savage [mailto:r...@savage.net.au] Sent: Saturday, December 29, 2012 10:06 PM To: perl-gedcom@perl.org Subject: Re: Gedcom.pm 1.17 released Hi Stephen On 30/12/12 11:03, Stephen Woodbridge wrote: > On 12/29/2012 5:26 PM, Paul Johnson wrote: > What I noticed was that the data way nicely tag in the HTML so I am > writing a parser to read the HTML can generate a Gedcom file. I have > the basics working, but I have to do more work on it to fix bugs and > collect more of the data than I current am. I'm side tracked with work > at the moment so it is on hold. When I'm done it will have generated a > 40K+ person Gedcom file. This should be able able to create a gedcom > from any "Second Site" generated website assuming it is similar to the > link above. Or you can ask the site owner for a copy of the gedcom :), > but this seemed like a worth challenge at the time. Are you using HTML::TreeBuilder and the v-e-r-y nice look_down() method? -- Ron Savage http://savage.net.au/ Ph: 0421 920 622 ----- No virus found in this message. Checked by AVG - www.avg.com Version: 2013.0.2805 / Virus Database: 2637/5997 - Release Date: 12/30/12