Thank you, Peter. I learned about such things during my training with SIL.From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]On Behalf
Of Michael Everson
*hxC(V)- ~ *shxC(V)- [the x's to be subscripted]They are not mathematical formulae. It is a kind of linguistic
are more like mathematical formulae than text.
(though not phonetic) notation.
Yes. In case anyone isn't sure what the notation means, I believe it is as follows (assuming these works are typical of works in historical linguistics):
* precedes a transcription to indicate it is a historical reconstruction posited by inference from data obtained from later periods in time
~ is used to indicate an alternation; thus, *hxC(V) is in alternation with *shxC(V)
s and h are symbols for particular phonemes; as indicated in the doc, hx is being used to represent the laryngeal with uncertain vowel coloring
C and V, of course, represent an arbitrary consonant and vowel
( ) denotes optionality; thus, the above notation is short hand for *hxC ~ *shxC and *hxCV ~ *shxCV
So, the expression *hxC(V)- ~ *shxC(V) is saying, in relation to certain phoneme sequences known to exist in later varieties, that an earlier precedessor to the language(s) in question is believed to have had hC or shC, and hCV or shCV (with the vowel colouring on the h unknown or left unspecified).
My point is that this is a linguistic notation for a kind of formula, where C replaces any actual consonant, ~ acts like a mathematical equivalence operator, and x is a subscript identifying a particular variant of h. This is not some kind of unusual orthography but a specialist scientific notation. It is the same notation as h1, h2, h3 or ha, hb, hc etc (the second character subscripted in each case) used in all kinds of notational conventions but primarily mathematical and scientific ones. Some lingustics textbooks are full of this kind of notation. For an example chosen almost at random, I found the following in an old paper by Kenneth Pike (in Ruth M. Brend ed. "Advances in Tagmemics", North-Holland 1974, p.238):
(2) eMk = eaTCaf, eaTCgf, egTCaf, egTCgf
where all the lower case letters are subscripted, and examples of this in which the word "catch" is followed by subscript af or gf.
My point here is that if we once start on encoding subscript letters used in specialist scientific notation, there is no easy place to stop. Either we need to accept the principle that subscripts are encodable and set aside space for a whole alphabet of them (and an upper case alphabet and a Greek alphabet as well, plus punctuation); or else we need to say from the start that these things are not plain text and should not be encoded in Unicode.
-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/

