At Fri, 23 Mar 2001 00:13:33 -0800, Rick McGowan <[EMAIL PROTECTED]> wrote: >David Starner wrote: > >> I have a copy of Shellbear's Practical Malay Grammar that I'm preparing >> to transcribe for Project Gutenberg. Unfortunately, he represents >>the >> Malaysian alphabet in a Latin transliteration that includes ng as >>a >> single ligatured form, and I don't know how to transcribe in Unicode. > >Could you perhaps post or point to a picture of what it looks like? > I >suppose it's an "N" with a loopy tail of some type. More like rg. A picture is attached. (Was attached. Rick probably has a copy, but it seems to have got lost between here and the Unicode mailing list.) >The character you are looking for is probably U+014B in lowercase or >U+014A in uppercase. I would be rather surprised if that's not what >you're >looking for. It's not exactly what I was looking for. I may just use it and make a note that the glyph is probably not exactly right. >BTW, a bit off topic here but: I think it's high time that Project >Gutenberg adopted some very clear character encoding guidelines now >that >they're expanding so widely. Or have they already adopted them and >I've >just missed the policy statement...? They're in for a real mess if >they >don't specify character encodings in a very controlled way. At some points, they are already a real mess. You can dig through Gutenberg archives and find various (unlabeled) encodings for the Latin-1 coverage. There's at least one Japenese document that just says "you need a Japenese OS to read this." 8-bit documents are usually labeled as 8-bit, without any indication of encoding. The Bulgarian files are clearlly labeled Windows-1251, at least. OTOH, the policy of doing everything possible in ASCII has saved Gutenberg some problems. They're moving towards Unicode for any files that can't be released in a standard 8-bit encoding (and a few that can are double released), and a number of new books are being released in both ASCII and Unicode editions. See ftp://metalab.unc.edu/pub/docs/books/gutenberg/GUTINDEX.02 and GUTINDEX.01 for recent examples. Most of the unmarked stuff is ASCII, but there's a number of clearly Unicode marked and "8-bit German" marked files. -- David Starner - [EMAIL PROTECTED] Free, encrypted, secure Web-based email at www.hushmail.com

