These are important questions. IMHO, a Gedcom date parser should (1) distinguish between these cases, (2) indicate to the caller whether the date has any special attributes (EST, ABT, etc), (3) return a date in a normalised format using the native GEDCOM syntax with no fabricated information. Any conversions to one of the confusion of standard internet date/time formats could be handled by a separate call, once its known whether the date is exact and complete (e.g. 1 JAN 2000).
The difference between these cases is fundamental to genealogy. We should be trying to represent what we actually know. If I know only that a child was born in the year 2000, I record "2000" in the GEDCOM date. I don't want tools to fabricate a month and year. To expand on (2). When I last studied the GEDCOM specification some years ago, I was left with the impression that GEDCOM's attempt to codify the various possible attributes that a date might have was a little sloppy. I doubt that the syntax quite allows one to express everything in a date that one might want (for example, I might know that an ancestor was born on 4 JUL, yet be uncertain about the year, but I don't think that "4 JUL ABT 1860" is allowed). I also have the feeling that the GEDCOM's spec explanation of the semantics of these attributes is not complete. Add to this the issues of language (and what do you do if a language comes along in which a month has the name "ABT"), and the calendar (Gregorian, Julian, or even Shire Reckoning for any hobbit genealogists out there), it's clear that comprehensively dealing with the all the possible attributes that a date could have will be far from trivial. But I'd have to hope that a community effort would eventually cover most of the important cases. To expand on (3). If you're developing a tool to handle date information in GEDCOM files from a variety of sources, the place you need the most help is actually parsing the date. You want to know whether the date is syntactically valid, whether it's approximate (and there's various kinds of approximate), whether the day, month and/or year are supplied. You'd like to know what language and calendar it's in. You probably want to write it out again, normalised for capitalisation, whitespace, language (and calendar, but that might be hard). A couple of years ago, I wrote a little tool to extract all exact dates of birth, marriage and death events from a GEDCOM file, and write them out as a calendar file (ical format, RFC 2445). I was using Paul Johnson's Gedcom package from CPAN. I started out using Date::Manip to parse dates (partly because the Gedcom package was already using it), but I ended up having to write my own parser. The problem was that Date::Manip's parser would fabricate days ("JAN 2000" would come back as "1 JAN 2000"). Paul Johnson's date normalisation routine suffered from the same problem, because it too relied on Date::Manip. A pity, because date normalisation would be very helpful when you're comparing information about an event from two GEDCOM files. Thanks for starting an unusually interesting discussion. Stephen On 20 September 2011 05:35, Ron Savage <r...@savage.net.au> wrote: > > On Thu, 2011-09-15 at 12:45 +0200, Eugene van der Pijll wrote: > > > * How does your module record the difference between "2000", "JAN 2000" > > and "1 JAN 2000"? > > How does you code distinguish between these cases? > > Do you one of the GEDCOM concepts About, Calculated, Estimated or > Interpreted? > > Do people want to know that part of a date has been fabricated by the > code? >