Hi Andrew, thanks for helping to move things forward. On 10/4/13 9:17 PM, Andrew Sullivan wrote: > Dear colleagues, > > I reviewed draft-ietf-precis-mappings-03 today, at long last. I > apologise for being so late.
No worries. Better that we get it right than that we finish it quickly. Or maybe that's just a convenient excuse. :-) > I put together a number of incoherent questions about the case folding > stuff, but fortunately I re-read the mailing list archives on this > topic before posting a long message. I agree with Peter: I find this > section of the document very confusing, and I think it may be wrong. > In particular … > > On Thu, Sep 19, 2013 at 04:39:12PM +0900, Takahiro Nemoto wrote: > >> Considering the maintenance and preservation of the document, >> leaving it the way it is now is not a bad idea. > > …I am pretty sure it shouldn't be left the way it is. > > It seems to me that our principle generally needs to be that Unicode > is the thing we use, and if Unicode is broken it's Not Our Problem. > So we should figure out how to say, "Do the Unicode-y right thing > here," and then put that in. I especially don't want to get into > specifying special language-specific tables ourselves: we don't have > the expertise, I think. > > I can't think of any better suggested text than what Peter already > sent, so I think that's the right direction. In the unlikely event > something clearer comes to me in the night, I promise to write it > down. I suggested text to clear up the first paragraph. However, my message merely asked the key question, but left it unanswered: what are we trying to accomplish here? As I noted, Appendix B.1 simply matches the Language-Sensitive Mappings from the SpecialCasing.txt file in the Unicode Character Database. If that's *all* we're trying to accomplish, then we could simply say "apply the Language-Sensitive Mappings in SpecialCasing.txt". However, I get the sense that we're actually trying to accomplish more, e.g., applying at least the context-sensitive mapping for Greek final sigma -- in my example, a nickname of "ΦΙΛΟΣ ΜΟΙ" would be case folded to "φιλος μοι" (with a Greek final sigma, which is correct in Greek) and not to "φιλοσ μοι" (with a Greek medial sigma, which is incorrect in Greek). It's also not clear to me if we have a position on full case folding vs. simple case folding (e.g., ẞ = U+1E9E to "ss" instead of "ß" = U+00DF). It seems to me that we might want to suggest a consistent approach here so that we have improved interoperability. So IMHO one approach would be: 1. Apply the language-sensitive mappings from SpecialCasing.txt 2. Apply the context-sensitive (i.e., "language-insensitive") mappings from SpecialCasing.txt I'm still not sure what to do about about full vs. simple case mapping, but I see no strong reason to prefer simple case mapping because I don't see a problem with our algorithm resulting in two characters (e.g., "ss") instead of one. Peter -- Peter Saint-Andre https://stpeter.im/ _______________________________________________ precis mailing list [email protected] https://www.ietf.org/mailman/listinfo/precis
