On 9/4/16 6:34 PM, Peter Saint-Andre wrote:
On 9/4/16 5:30 PM, Erin Millard wrote:
>>> * §2.2 Specifies that UTF-8 MUST be used as the encoding; do
we really
>>> want to limit this to UTF-8 only? Is this for comparison
purposes?
>>> Then again, 99.99% of the time UTF-8 is what you should be using
>>> anyways, so I'm not sure that it matters.
>>
>> UTF-8 is your friend, and everything in PRECIS is UTF-8.
>
> PRECIS is mostly encoding agnostic; implementations might favor a
> specific encoding, but I don't think anything in the spec
specifically
> *needs* UTF-8. That being said, there are so few reasons to use
> anything other than UTF-8 that I don't think it really matters,
it was
> just curious to me that some of the PRECIS related specs called out
> UTF-8 and some didn't.
I thought they all did, but will double-check.
This actually became a bigger issue when attempting to implement PRECIS
prepare in JavaScript for the browser. JavaScript doesn't have native
UTF-8 support, so this meant the extra bloat of bringing in a UTF-8
library.
It didn't make a lot of sense to me either, since all the encoding
affects is how you go from string to code points, and vice versa. It had
no effect on the rest of my implementation. I could absolutely be
missing something, but compared to how focused the rest of the spec is,
the UTF-8 requirement seemed like an afterthought.
Can anyone explain which parts of PRECIS are actually predicated on the
original string being encoded in UTF-8?
Are we perhaps getting confused between the encoding that is sent over
the wire and the encoding that is used within the processing application?
In general, we in the IETF prefer to send UTF-8 over the wire. However,
it's true that this is a matter for the "using protocol" (e.g., I
distinctly recall an extremely long thread in the XMPP WG years ago
about whether to support only UTF-8 or to give clients and servers the
ability to also use UTF-16 - and "UTF-8 only" won that debate). Given
that some protocols or other technologies that use PRECIS might use
UTF-16 or give applications the ability to choose an encoding, you're
probably right that it makes sense to relax the rule for PRECIS itself.
I'll think about this some more and propose some text.
As promised, I've thought about it further and I agree that specifying
an encoding of UTF-8 is not really appropriate in 7613bis and 7700bis.
In fact, RFC 7564 (the PRECIS framework) states the following in §13.1:
Although strings that are consumed in PRECIS-based application
protocols are often encoded using UTF-8 [RFC3629], the exact encoding
is a matter for the application protocol that uses PRECIS, not for
the PRECIS framework.
Thus, for instance, it's fine for RFC 7622, which defines the address
format in XMPP, to specify an encoding of UTF-8, but not for 7613bis or
7700bis to do so.
I notice that RFC 5890 (for IDNA) has text like this
o A "U-label" is an IDNA-valid string of Unicode characters, in
Normalization Form C (NFC) and including at least one non-ASCII
character, expressed in a standard Unicode Encoding Form (such as
UTF-8).
Text similar to that might be best for 7613bis and 7700bis.
Peter
_______________________________________________
precis mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/precis