2011/7/6 Asmus Freytag <asm...@ix.netcom.com>: > On 7/3/2011 6:31 AM, Philippe Verdy wrote: > > Regarfing the previous comment about the Danish "aa", > > Sorry, most of that discussion missed the mark. > > "Modern" Danish can have "AA" for two reasons. Accidental occurrence, as in > "dataanalyse" which is composed of two words which just happens to put two > "A" together. The other is frozen spellings for names and the like. In the > former case, you can never use "å", in the latter case, you may not want to.
I had already perfectly understood that. May be you only read a part of my message and made an assumption that I would consider them equivalent, which I don't. This was clear in my message. > In the former case, you do not want to sort "AA" as if it was "å", in the > latter case, you do. > > None of that has anything to do with ASCII - it's a question of orthographic > practices, not of legacy encoding. Here again, I have not asserted anythng about ASCII, except that it was used (and probably continues to be used) as a practice in Danish when å is not available in a more limited repertoire (including in DNS, where IDNA is not an option). > Because accidental digraphs (in Danish) happen at word boundaries in a > compound, the SHY is an elegant way to mark them. Yes, OK, in a text where one do not want any word-breaking in the rendered paragraphs (with or without justification of whitespaces or microjustifications), it would be inconvenient. In fact, earlier in a previous message I had already favored ZWNJ for that additional control (just like I also favor ZWJ for the usual Danish digram, if it occurs in a Danish word (such as a proper name) inserted in a non-Danish text rendered with automatic word-breaking (for this case of mulitlingual documents, in fact I doubt that those limited occurences of Danish in text in another language that doesn't have this digraph, shoudl prebably even avoid recognizing the Danish digram, for exampel when indexing whole texts to create word lists, notably for creating sorted lists, as the unusual Danish word would not be found at the expected place in the index directory.) My opinion is still that, in an almost 100% Danish text, nothing is needed: the document should only be parsed globally when knowing (or at least guessing) in which language it is written ; then you can use an external dictionnary lookup for exceptions such as occurences in glued compound words like "dataanalysis" (anyway in this word, the gluing can be explicitly encoded as ZWNJ, or possibity better as Word Joiner which has the interest of remaining outside of the first grapheme cluster instead of part of it when using ZWNJ). There's no good universal solution. Users will need to adapt to their working environment and the rendering and computed semantic they get with each option. Philippe.