Re: [sword-devel] 3-letter language character codes

DM Smith Mon, 09 Nov 2009 12:48:10 -0800

For those that are interested. Here is the perl script, makeISO639.pl,I use to create the listing for JSword.In order for names to sort better, I'm using the "inverted" name thatputs the family name in front of the qualifier.

This means that all the Zapotek languages sort together.

(Note, I have to run the output through native2ascii to create aproperty file):

******************************************************************************
#!/usr/bin/perl

# This file is used to create a Java property file from SIL's ISO639-3files.

# That file changes frequently both in content and layout.
# Adjust this program as needed.
#
# The files are currently downloaded from:
#       http://www.sil.org/iso639-3/iso-639-3_20090210.tab
#       http://www.sil.org/iso639-3/iso-639-3_Name_Index_20090210.tab
#       http://www.sil.org/iso639-3/iso-639-3_Retirements_20090126.tab
#
# Run the program as:
#       makeISO639.pl > iso639.txt
#
# Sort the file if desired with:
#       makeISO639.pl | sort -t = -k 2 > iso639.txt
#
# Convert it from UTF-8 to Java's ASCII representation with:
#       native2ascii -encoding utf-8 iso639.txt > iso639.properties

use strict;
use Unicode::Normalize;
binmode(STDOUT, ":utf8");

my $nameIndex = "iso-639-3_Name_Index_20090210.tab";
my $langCodes = "iso-639-3_20090210.tab";
my $deadCodes = "iso-639-3_Retirements_20090126.tab";
my %names = ();
open(my $nameIndexFile, "<:utf8", $nameIndex);
# skip the first line
my $firstLine = <$nameIndexFile>;
while (<$nameIndexFile>)
{
        # chomp ms-dos line endings
        s/\r//o;
        chomp();
        # Skip blank lines
        next if (/^$/o);
        # ensure it is normalized to NFC
        $_ = NFC($_);
        my @line = split(/\t/o, $_);
        $names{$line[0],$line[1]} = $line[2];
}

open(my $langFile, "<:utf8", $langCodes);
# skip the first line
$firstLine = <$langFile>;
while (<$langFile>)
{
        # chomp ms-dos line endings
        s/\r//o;
        chomp();
        # Skip blank lines
        next if (/^$/o);
        # ensure it is normalized to NFC
        $_ = NFC($_);
        my @line = split(/\t/o, $_);
        # exclude extinct languages
        next if ($line[5] eq 'E');
        my $name = $names{$line[0],$line[6]};
        print "$line[3]=$name\n" if ($line[3]);
        print "$line[0]=$name\n";
}

# The dead codes file is iso-8859-1. This may change at some date.
open(my $deadFile, "<:encoding(iso-8859-1)", $deadCodes);
# skip the first line
$firstLine = <$deadFile>;
while (<$deadFile>)
{
        # chomp ms-dos line endings
        s/\r//o;
        chomp();
        # Skip blank lines
        next if (/^$/o);
        # ensure it is normalized to NFC
        $_ = NFC($_);
        my @line = split(/\t/o, $_);
        print "$line[0]=$line[1]\n";
}
******************************************************************************

On Nov 9, 2009, at 2:01 PM, DM Smith wrote:

Here is a list of the proposed changes for the last update of 2009(review ends December 15, so I think we can expect a new listingshortly after that):
        http://www.sil.org/iso639-3/chg_requests.asp
The last column gives the reason for the request.

Perhaps of interest are some Iranian languages.

In His Service,
         DM

On Nov 9, 2009, at 1:32 PM, DM Smith wrote:
On 11/09/2009 11:51 AM, Karl Kleinpaste wrote:
DM Smith<[email protected]>  writes:
ISO-639-3 is a changing set of codes.
...
These all changed on 2009-01-16.
What is the point of "standardized" abbreviations if the"standard" isnot fixed? "ckw" is replaced with "cak", "tzz" with "tzo"? Forwhose
benefit is that, other than as a make-work issue for people like us?
I don't know all the history, and what I know may be a bit faulty.
There are about 7500 languages. The beginnings of the ISO-639 werein the Ethnologue, started in 1950. ISO-639-1 was adopted in 1988.ISO-639-2 was adopted in 1998 and covered about 400 languages.IS0-639-3 was given to SIL in 2002 and the first adoption of it waspublished in 2007. So only a few years ago, the list was quitesmall. At that time, some of our module had Ethnologue codes of theform x-aaa or x-yyy-aaa.
At this point ISO-639-3 encompasses all 2 and 3 letter codes. It isactively maintained and updates happen at least once a year.
Much of the effort to define languages resolves around literacy andBible translation. It is widely held that the return of Christ ispredicated on the gospel being preached to every tongue and thereis an effort to get the Bible into every spoken language. Manylanguages have no alphabet. My daughter and her husband spent thesummer finalizing the alphabets for 3 closely related languages. Atthis point they, and the team that they were on, believe that theseare 3 distinct languages and not merely dialects of each other. Assuch, they would have three different codes and language names. Iflater, these were found to be merely dialectical different, the 3alphabets might be merged into one and the 3 different codes andtheir names would be replaced with one name.
If you look at the reasons for retiral, many of them were 'M', thatis merging several codes into one code.
On a similar note, the two letter codes are not stable either.Hebrew used to have the code 'iw' now it has the code of 'he'.Likewise for Indonesian, it use to have the code 'in', but now itis 'id'. Now with the latest CDRL, 'in' is an alias for 'id'.
These two have bitten me as Java silently transforms the currentcode to the obsolete one. 'iw', Hebrew, bit me a few years back.Indonesian, 'in', was last week as Tonny supplied an Indonesiantranslation for JSword. We had to name the resource files with theobsolete name to get it to work.
In Him,
  DM

_______________________________________________
sword-devel mailing list: [email protected]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: [email protected]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page



_______________________________________________
sword-devel mailing list: [email protected]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] 3-letter language character codes

Reply via email to