On Wed, Jun 6, 2018 at 3:45 AM Richard Levitte <[email protected]> wrote:
> In message <[email protected]> on Tue, 5 > Jun 2018 18:37:21 -0400, Viktor Dukhovni <[email protected]> > said: > > openssl-users> > openssl-users> > openssl-users> > On Jun 3, 2018, at 4:45 AM, Richard Levitte < > [email protected]> wrote: > openssl-users> > > openssl-users> > Yeah, I just learned that myself. Somehow, I thought > wchar_t would be > openssl-users> > Unicode characters. So ok, with this information, UTF-8 > makes > openssl-users> > sense... > openssl-users> > openssl-users> Nico has convinced me that the mapping from UTF-8 to > BMPString should > openssl-users> be UTF-16, which is agrees with the BMP representation on > the code > openssl-users> points in the Basic Multinational Plane, but also supports > surrogate > openssl-users> pairs for code points outside the plane, so that if someone > wanted > openssl-users> to use "emoji" (or more traditional glyph outside the BMP) > for their > openssl-users> password, they could. This is a strict superset of UCS-2 > and avoids > openssl-users> having to reject some UTF-8 codepoints. > > Yup. It seems that BMPString evolved from UCS-2 into UTF-16 at some > point, and that evolution affected PKCS#12 objects... > Is there a spec citation for this, or some documented experiments against other implementations' behavior? (What do Microsoft and NSS do here?) I was pondering something similar recently, but things do seem to point at UCS-2 right now. UCS-2 is indeed an unfortunate historical wart, but X.680 says: > BMPString is a subtype of UniversalString that has its own unique tag and contains only the characters in the Basic Multilingual Plane (those corresponding to the first 64K-2 cells, less cells whose encoding is used to address characters outside the Basic Multilingual Plane) of ISO/IEC 10646. RFC 7292 just says to use a BMPString. That doesn't suggest anyone has actually updated it for UTF-16. This is fine for X.509 where BMPString is one of many possible string types and folks can use UTF8String for this anyway. For PKCS#12, yeah, this introduces limitations that may be worth resolving, UTF-16 being the obvious fix. But if it's not in a spec, we should get it into one and also be clear on if this is OpenSSL inventing a behavior or following de facto behavior established elsewhere.
_______________________________________________ openssl-project mailing list [email protected] https://mta.openssl.org/mailman/listinfo/openssl-project
