These are important questions.

IMHO, a Gedcom date parser should (1) distinguish between these cases,
(2) indicate to the caller whether the date has any special attributes
(EST, ABT, etc), (3) return a date in a normalised format using the
native GEDCOM syntax with no fabricated information. Any conversions
to one of the confusion of standard internet date/time formats could
be handled by a separate call, once its known whether the date is
exact and complete (e.g. 1 JAN 2000).

The difference between these cases is fundamental to genealogy. We
should be trying to represent what we actually know. If I know only
that a child was born in the year 2000, I record "2000" in the GEDCOM
date. I don't want tools to fabricate a month and year.

To expand on (2). When I last studied the GEDCOM specification some
years ago, I was left with the impression that GEDCOM's attempt to
codify the various possible attributes that a date might have was a
little sloppy. I doubt that the syntax quite allows one to express
everything in a date that one might want (for example, I might know
that an ancestor was born on 4 JUL, yet be uncertain about the year,
but I don't think that "4 JUL ABT 1860" is allowed). I also have the
feeling that the GEDCOM's spec explanation of the semantics of these
attributes is not complete.

Add to this the issues of language (and what do you do if a language
comes along in which a month has the name "ABT"), and the calendar
(Gregorian, Julian, or even Shire Reckoning for any hobbit
genealogists out there), it's clear that comprehensively dealing with
the all the possible attributes that a date could have will be far
from trivial. But I'd have to hope that a community effort would
eventually cover most of the important cases.

To expand on (3). If you're developing a tool to handle date
information in GEDCOM files from a variety of sources, the place you
need the most help is actually parsing the date. You want to know
whether the date is syntactically valid, whether it's approximate (and
there's various kinds of approximate), whether the day, month and/or
year are supplied. You'd like to know what language and calendar it's
in. You probably want to write it out again, normalised for
capitalisation, whitespace, language (and calendar, but that might be
hard).

A couple of years ago, I wrote a little tool to extract all exact
dates of birth, marriage and death events from a GEDCOM file, and
write them out as a calendar file (ical format, RFC 2445). I was using
Paul Johnson's Gedcom package from CPAN. I started out using
Date::Manip to parse dates (partly because the Gedcom package was
already using it), but I ended up having to write my own parser. The
problem was that Date::Manip's parser would fabricate days ("JAN 2000"
would come back as "1 JAN 2000"). Paul Johnson's date normalisation
routine suffered from the same problem, because it too relied on
Date::Manip. A pity, because date normalisation would be very helpful
when you're comparing information about an event from two GEDCOM
files.

Thanks for starting an unusually interesting discussion.

Stephen



On 20 September 2011 05:35, Ron Savage <r...@savage.net.au> wrote:
>
> On Thu, 2011-09-15 at 12:45 +0200, Eugene van der Pijll wrote:
>
> > * How does your module record the difference between "2000", "JAN 2000"
> >   and "1 JAN 2000"?
>
> How does you code distinguish between these cases?
>
> Do you one of the GEDCOM concepts About, Calculated, Estimated or
> Interpreted?
>
> Do people want to know that part of a date has been fabricated by the
> code?
>

Reply via email to