> > Simply because some images appear in some > > documents does not mean that they automatically should be > > represented as encoded > > characters. > > These aren't images. They're clearly letters; they occur in running texts and > represent > the sounds of a spoken language.
Well, I agree with that assessment. > If I were transcribing them, I wouldn't encode them > as pictures; I would encode them as PUA elements or XML elements (which are usually > more easier to use and more reliable than the PUA). And with that assessment, as well. > I'll admit that it's a bit sketchy encoding these characters based on one article by > one author. But I think it important to remember that more and more text is available > online, even stuff that might never get reprinted in hardcopy, and that needs > Unicode. And in generally, I can't find fault with that, either. But the argument in this particular case hinges on a particular, nonce set of characters. We have this one scholar, who invented a bunch of characters in the 20's to represent click sounds that nobody was doing justice to at that point, either in understanding their phonetics or making sufficiently accurate distinctions in their recording. Bully for Dokes -- it was an important advance in the field of Khoisan studies and the phonetics of clicks. But even though he published his analysis, using his characters, nobody else chose to adopt his character conventions. Subsequent scholars, and the IPA, chose *other* characters to represent the distinctions involved, in part because Dokes' inventions were just weird and hard to use, as well as neither (in my opinion) mnemonic nor aesthetically pleasing. Well, we've encoded ugly letters for ugly orthographies in ugly scripts before. That isn't the issue. But the non-use of these forms is. It comes down then to a *prospective* claim that someone *might* want to digitize the classic Dokes publication and that if they did so they would require that the particular set of weird phonetic letters used by Dokes would have to be representable in Unicode plain text in order for that one publication to be made available electronically. (Or a few other publications that might cite Dokes verbatim, of course.) Well, in terms of requirements, I consider that more than a little cart before the horse. I'd be more sympathetic if someone was actually *trying* to do this and had a technical problem with representing the text accurately for an online edition which was best resolved by adding a dozen character to the Unicode Standard. Then, at least there would be a valid *use* argument to be made, as opposed to a scare claim that 50 years from now someone *might* want to do this and not be able to if we don't encode these characters right now. Right *now* anyone could (if they had the rights) put a version of Dokes online using pdf and an embedded font, and it would be perfectly referenceable for anyone wanting access to the content of the document. True, the dozen or so "weird" characters in the orthography wouldn't have standard encodings, so searching inside the document for them wouldn't be optimal. But is the burden that might place on the dozen or so Khoisan orthographic historians and phonetic historians who might actually be interested in doing so out of scale with the burden placed permanently on the standard itself for adding a dozen or so nonce characters for that *one* document? After all those historians and scholars today are basically using the document in its printed-only (out-of-print) hard copy format, and we aren't exactly worried about the difficulties that *that* poses them, now are we? I might point out at this point that the Unicode Standard itself is published online using non-standard encodings for many of its textual examples, simply because of the limitations of FrameMaker and PDF and fonts and the specialized requirements of citing lots and lots of characters outside normal text contexts. But I don't hear people yelling about the online Unicode Standard is crippled for use by people who wish to refer to it because you can't do an automated search for <ksha> in it which will accurately find all instances of Devanagari ksha in the text. And the *database* arguments just don't cut it. If anybody is seriously going to be using Dokes materials in comparative Khoisan studies, they will *normalize* the material in their text databases. After all, this is just one of a large variety of really varied material, in all kinds of orthographies, and in all levels of detail and quality. Arguing that making these particular dozen nonce characters searchable by giving them standard Unicode values just doesn't cut it for me, because if I were going to do that kind of work, a significant amount of philological work would be required to "massage" the data into comparable formats, anyway, and use of intermediate normalized conventions would not be a problem -- in fact, it would almost be mandatory. Finally, if someone actually wants to do a redacted publication of Dokes for its *content*, as opposed its orthographic antiquarian interest, it is perfectly possible to do so with an updated set of orthographic conventions that would make it more accessible to people used to modern IPA usage. Usability of published or republished documents is not limited to slavish facsimile reproduction of their orginal form -- for that we have facsimiles. :-) I love Shakespeare, but I don't have to read his plays with long ess's and antique typefaces. --Ken

