Hi Mike On Thu, 2011-09-15 at 14:02 +1000, Mike Hamilton wrote: > Further thoughts on DateTime::Format::Gedcom language issues: > > This may be heretical, but it seems to me that attempting to provide > *universal* language support opens up a can of worms that (unless one is > prepared to devote a lifetime or three to the task) is too large to be > digested. The required knowledge of individual languages surely presents an > insurmountable hurdle.
I certainly don't want to support anything even vaguely approaching 'universal'. I'm hoping no knowledge of /languages/ is required, but rather only of /calendars/. And even that will be minimal. > Ron mentions (in another context) modern French. As it happens, I have some > French ancestry, and enough elementary knowledge to know my avril from my > elbow; but I have no idea how dates are represented in Swahili, Farsi, > Mandarin or the inverse click of the Kalahari Bushmen. I'll bet that there > are many, many weird and wonderful (to Western mindsets) ways of describing > dates. True, but the problem is presumably tractable precisely because we're dealing with nothing but GEDCOM dates. > LANGUAGE_ID in the GEDCOM spec has: > > Afrikaans | Albanian | Anglo-Saxon | Catalan | Catalan_Spn | Czech | Danish | > Dutch | English | Esperanto | Estonian | Faroese | Finnish | French | German > | Hawaiian | Hungarian | Icelandic | Indonesian | Italian | Latvian | > Lithuanian | Navaho | Norwegian | Polish | Portuguese | Romanian | Serbo_Croa > | Slovak | Slovene | Spanish | Swedish | Turkish | Wendic > > plus ("other languages not supported until UNICODE") > > Amharic | Arabic | Armenian | Assamese | Belorusian | Bengali | Braj | > Bulgarian | Burmese | Cantonese | Church-Slavic | Dogri | Georgian | Greek | > Gujarati | Hebrew | Hindi | Japanese | Kannada | Khmer | Konkani | Korean | > Lahnda | Lao | Macedonian | Maithili | Malayalam | Mandrin |Manipuri | > Marathi | Mewari | Nepali | Oriya | Pahari | Pali | Panjabi | Persian | > Prakrit | Pusto | Rajasthani | Russian | Sanskrit | Serb | Tagalog | Tamil | > Telugu | Thai | Tibetan | Ukrainian | Urdu | Vietnamese | Yiddish ] > > Now, I hear you saying "that's ridiculous - I've never seen a GEDCOM in > Navaho, Faroese, or Rajasthani, and never will", which is a very fair point. > But DateTime::Format::Gedcom claims to parse GEDCOM dates; it doesn't say > "some conditions apply." > > Therefore, I reluctantly and unhappily suggest that DateTime::Format::Gedcom > should be a base class, from which DateTime::Format::Gedcom::English, > DateTime::Format::Gedcom::French, DateTime::Format::Gedcom::Sanskrit [...] > would derive. > > Yes, it's ghastly. The old joke about "surpasseth all understanding" = > "understands all parsers" applies. Nope - Not worried. GEDCOM's definition of date_calendar_escape mercifully does not refer to language_id (of which there are just 3 references in the doc). I chose DateTime::Format::Natural so as to pass over to it as much work as possible, with the aim of leaving myself with as little work as possible. That's what CPAN is all about. Nevertheless, Mike and Mike's comment make me think I will have to re-work the code, along the lines of: set_month_names($language, $array_of_month_stuff) Where $language becomes recognizable when it appears in a date_calendar_escape, and the arrayref is like: ['January', 'Jan', 30, ...], with 3 elements per month. I still don't like the idea of parsing the dates myself, but I assume it will come to that. -- Ron Savage http://savage.net.au/ Ph: 0421 920 622