Hi Andrew, thanks for helping to move things forward.

On 10/4/13 9:17 PM, Andrew Sullivan wrote:
> Dear colleagues,
> 
> I reviewed draft-ietf-precis-mappings-03 today, at long last.  I
> apologise for being so late.

No worries. Better that we get it right than that we finish it quickly.
Or maybe that's just a convenient excuse. :-)

> I put together a number of incoherent questions about the case folding
> stuff, but fortunately I re-read the mailing list archives on this
> topic before posting a long message.  I agree with Peter: I find this
> section of the document very confusing, and I think it may be wrong.
> In particular …
> 
> On Thu, Sep 19, 2013 at 04:39:12PM +0900, Takahiro Nemoto wrote:
> 
>> Considering the maintenance and preservation of the document, 
>> leaving it the way it is now is not a bad idea.
> 
> …I am pretty sure it shouldn't be left the way it is.
> 
> It seems to me that our principle generally needs to be that Unicode
> is the thing we use, and if Unicode is broken it's Not Our Problem.
> So we should figure out how to say, "Do the Unicode-y right thing
> here," and then put that in.  I especially don't want to get into
> specifying special language-specific tables ourselves: we don't have
> the expertise, I think.
> 
> I can't think of any better suggested text than what Peter already
> sent, so I think that's the right direction.  In the unlikely event
> something clearer comes to me in the night, I promise to write it
> down.

I suggested text to clear up the first paragraph. However, my message
merely asked the key question, but left it unanswered: what are we
trying to accomplish here?

As I noted, Appendix B.1 simply matches the Language-Sensitive Mappings
from the SpecialCasing.txt file in the Unicode Character Database. If
that's *all* we're trying to accomplish, then we could simply say "apply
the Language-Sensitive Mappings in SpecialCasing.txt".

However, I get the sense that we're actually trying to accomplish more,
e.g., applying at least the context-sensitive mapping for Greek final
sigma -- in my example, a nickname of "ΦΙΛΟΣ ΜΟΙ" would be case folded
to "φιλος μοι" (with a Greek final sigma, which is correct in Greek) and
not to "φιλοσ μοι" (with a Greek medial sigma, which is incorrect in Greek).

It's also not clear to me if we have a position on full case folding vs.
simple case folding (e.g., ẞ = U+1E9E to "ss" instead of "ß" = U+00DF).
It seems to me that we might want to suggest a consistent approach here
so that we have improved interoperability.

So IMHO one approach would be:

1. Apply the language-sensitive mappings from SpecialCasing.txt
2. Apply the context-sensitive (i.e., "language-insensitive") mappings
from SpecialCasing.txt

I'm still not sure what to do about about full vs. simple case mapping,
but I see no strong reason to prefer simple case mapping because I don't
see a problem with our algorithm resulting in two characters (e.g.,
"ss") instead of one.

Peter

-- 
Peter Saint-Andre
https://stpeter.im/


_______________________________________________
precis mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/precis

Reply via email to