Hi Ron,
On 05/08/2011, at 02.05, Ron Savage wrote:
> Hi Mikkel
> 
> On Thu, 2011-08-04 at 22:44 +0200, Mikkel Eide Eriksen wrote:
>> Hi Ron,
>> 
>> Speaking of genealogy formats, I'm working some on a completely source-based 
>> format: http://carthag.github.com/sourcemarkup/ (don't mind the ugly color 
>> scheme, it was just a random one I chose).
> 
> I'm glad to know you're doing this sort of work, but I have a complex
> set of reactions to it:
> 
> o Isn't Cocoa an Apple-proprietary software thing? This implies anyone
> trying to use data in this format outside the Apple cocoon must have a
> separate set of code to import and manipulate the data. Who's going to
> write that? Only someone who chooses to support your format.

There are a number of open source technologies that would help, such as GNUStep 
and cocotron.org - but mostly I'm writing it in Cocoa since I'm a little fed up 
with Mac genealogy software and want to try my hand at writing a free, open app 
myself.

> o XML definitely handles nested data, so it can certainly be used as a
> communication format between users, but is the idea to keep the data in
> XML at all times? This requires an XML parser (which is a big topic I
> don't wish to pursue), and either using something like XML::Twig to
> access small parts of the file, or storing all the XML in a DOM-based
> structure, which normally takes up 100 (sic) times the space of the file
> itself (another big topic). This in turn leads to a discussion of speed
> of access for practical web-based display, and hence deployment under
> web servers such as Starman so that the code never exits, meaning the
> slow startup costs for the XML processor, etc, are avoided.
> 
> o I'll say again I understand XML has its uses, but its proponents are
> still trying to live down the XML fanaticism of the early days, when
> every little thing was put in a XML file (the format of choice for
> control freaks :-), which required a huge parser to be fired up just to
> read even a 3 line file. As always, it's up to the proponents to support
> their suggestions, rather than choosing it first and then afterwards
> claiming it's appropriate. That applies to my suggestions too :-(!.

The format is not for internal storage, but for the exchange of the data 
between apps/users/services. So the extra cost of an XML parser (which isn't so 
bad these days) would only be needed when actually exporting or importing data 
from whatever internal format is used. In my app, I'm using Core Data to handle 
the data, which ensures referential integrity, predicated fetches, etc. Someone 
else could use Postgres, or perl hashes, or whatever they want.

Since the format is transcribed text of sources with a set of semantic metadata 
on top, it lends itself to a markup language, hence I've chosen XML. That is 
only required for the transcriptions themselves. It might be possible to do a 
mixed format where only the transcription itself was XML, and the "external" 
info (source quality, crossrefs, etc) was in some other format, but  in my 
opinion that would needlessly complicate things.

> o So why go your own way anyway? Why not join the - very interesting -
> Better Gedcom group? I do thank you for the reference. We should all
> think about how that group and we Perl users can interact.

I've actually requested access to the BetterGedcom wiki (which I only 
discovered after starting my own thing), but have not yet had time to follow up 
on that. It seems like there are a lot of people out there fed up with the 
idiosyncrasies and shortcomings of the gedcom format. Hopefully we can create 
something extraordinary :-)

>> The idea is to use transcribed sources and mark them up with all info that 
>> is contained therein, so as to force all information to be referable to an 
>> actual source. From this data, it should be possible to build family trees, 
>> data sheets for individuals, etc.  It is still very much a work in progress 
>> and is just at this point an idea and a very fluid definition of what I want 
>> it to be able to do. The site has two unrelated examples, a birth (source1, 
>> recored as prose) and a marriage (source2, recorded in a table).
> 
> It's good you've directly focused on one of the major issues - how to
> handle textual material.
> 
> I should say I have a strong suspicion an ideal solution (if there is
> one!) will end up being:
> 
> o Have basic info (individuals, families, and hence relationships
> [i/f/r]) in a db (i.e. such as Postgres). This means rapid access in a
> viewport-like way so as to display a fragment of a family tree in a web
> page, and
> 
> o Have all other material in either (potentially huge) text/binary
> fields in the db, or even in external files, all accessible via the
> i/f/r records.

This is internal storage, which hasn't concerned me as much. I'm much more 
interested in the exchange of gedcom data and how to accomplish this in a clean 
& nonambiguous manner. That said, splitting the data along those lines seems 
sound.

>> Obviously it would be impossible to generate this data from a gedcom file, 
>> but it would be possible to (lossily) export from this format to gedcom.
> 
> Sure, but the whole point of my current attempt to stir people into
> responding is to think outside the 'Gedcom-ordained square', and to
> focus on what's needed, not what was defined in the past.

Agreed, but even if/when we do think up the be-all, end-all genealogy file 
format of the future, there would still be a transitionary period where people 
would need to interact with gedcom :-)

>> Additionally, this might also interest you, I came across it last month: 
>> http://bettergedcom.wikispaces.com/
> 
> Yes, indeed. Probably they're way ahead of me on this matter... I'd
> better lie low until I study their material.

Let me know what you think! 

Mikkel

> -- 
> Ron Savage
> http://savage.net.au/
> Ph: 0421 920 622
> 

Reply via email to