Many precomposed characters have been encoded initially only for roundtrip compatibility with previous existing encoding standards. But the way to go was for encoding combining characters separately (at least those that are normally not attached or overstriking the base letter).
Very soon, the normalization forms were created to unify the two possible types of encodings, and introduce the concept of canonical equivalences. The most common practice was though to use the precombined forms, so the NFC form became a de facto standard (but the decomposed forms are not deprecated at all, the standard has made lots of efforts to make sure that these forms would be fully equivalent, and implementers 'notably for plain text searches and for rendering) whve been urged to treat all canonically equivalent forms the same way (with only one exception : collation, where the difference is invisible at all collation strength, but is only considered for sorting together the canonical equivalent forms with a stable order, stability being reached by adding an artificial . level for sorting in binary order; the binary order is still often not standardized, implementers may notably sort them either using the numeric values of code points, or numeric values of code units in some UTF encoding; this binary order is in fact purely arbitrary and only meant for ensuring sort stability). The complexity of implementations to ensure the canonical equivalent texts are treated the same, caused the definition of "conforming processes". And to make sure that processes would remain conforming, and that encoded texts (in any normalization form, and also independantly of the UTF used for parsing or storing/sendind texts) would remain usable after being encoded once, even if the Unicode and ISO/IEC 10646 standards evolve; it was soonnecessary to add the concept of "encoding stability", and this was formalized by a strong policy. One consequence of the string policy is that we can no longer encode new precomposed characters for grapheme clusters that are already encoded in any existing standard form. The macron was encoded separately since long, as well as all basic Greek letters (even before many polytonic characters were introduced in the UCS). This makes now impossible to introduce new characters for greek letters with macron (except if we accept to make them non canonically equivalent, and this would create serious issues, because any conforming processes will not be rewritten or modified to recognize additional "visual equivalences" (which are different from "canonical equivalences". So you have to live with it, as long as the UCS will remain a universal standard supported both by international standard bodies, as well as the industry (you can expect it will remain a standard for at least one century, and even after that, it will remain widely used and interchange will be needed because there will be tremendous amounts of data that will be archived and won't be reencoded). But this is not a problem. The "de facto" NFC form (used for compatibility with old processes, is less and less effective, and lots of processes are now recognizing the canonical equivalences, and are able to process grapheme clusters encoded with several characters including combining characters. Storage space is also no longer a major issue (the problem is less in the encoding of a few clusters, than in the growing amount of encoded texts). If storage size really matters, we have used since long binary data compressors, like ZIP/deflate, gzip (whose performance today is extremely fast, used only during the input or ouput of the complete encoded text, but invisible to the more complex and more specific text parsers that application will need). It may still be a minor issue for "texts" that mist be extremely compact, such as those used as "identifiers". * But identifiers are generally invisible to the users and it is accepted that the exact orthography of identifiers is simplified (identifiers are frequently abbreviated as well). In fact identifiers are avoiding many standard practices applicable to normal texts (for example not using presentation forms, or limiting the usage of punctuation, or avoiding changes in capitalization, or adding new requirements about it that is completely foreign to the normal orthographic/grammatical rules of a human language). If your aplication using identifiers cannot use combining characters, you will drop them (and make identifiers distinct by using some other chracters such as adding digits or other convention). * In other cases, strong limitation of lengths will occur in some data input forms that use too short sizes for the storage (notably in databases). But the needed increase of storage max length is also something to take account if your application needs to handle international texts. There are some good practices to follow, and generally this means creating an UI where data fields can be read without scrolling horizontally or without breaking lines, choosing the appropriate font sizes. And then define a database length that will allow input and full stoage of any text that can fit in the displayed fields (for example using VARCHAR(80) and not VARCHAR(12) if you can input 12 Latin characters in your UI. But some newer languages do not need to restict storage length in their strings and applications will not be impacted by these size limits and will not restrict storage sizes to small values (database engines support texts with unspecified max length will still have a limitation, but it is generally long enough that you will still be able to fit any text in a reasonable input form ; if this limit is 255 code units, it is still long enough for storing a single data input field on any form for entering text in any existing language. So in summary no decision actually lead to excluding the encoding of Greek letters with macrons, but no need to do it was made as they were still not encoded in any standard, and the UCS already encodes them and TUS already standarrdizes the best practices for handling them in any standardized normalization forms and standardized encodings (or legacy encodings supported by roundtrip compatibility and listed in an informative appendice of the Unicode standard). 2013/8/3 Stephan Stiller <[email protected]> > > Characters restricted to dictionaries are generally not well >> supported. >> > And modern textbooks in a modern world :-) > > > The practice in Scott and >> Liddell is to reserve ᾱ, ῑ and ῡ for a note after the dictionary entry. >> > Liddell & Scott is old, just like Lewis & Short. We've moved on since > then, and given the stuff that's been put into the Greek blocks (things > that for sure aren't even in most dictionaries) I was just surprised. > Whatever the rationale for original precomposition and later inclusion of > more characters was, I suppose common practice instead of inclusiveness was > a criterion. > > With that written, thanks for the info. > > ῑ̓́φιος [...] ῑ̓́ (which should be thought of as ῑ >>> >>> with two combining diacritics: U+1FD1 U+0313 U+0301) >>> >> You overlooked the smooth breathing for the first iota. >> > It's there. Check again. > > Stephan > > >

