Re: [Openexr-devel] UTF-8

Florian Kainz Wed, 14 Nov 2012 12:08:08 -0800


The ACES image container specification, meant to be compatible OpenEXR,
prescribes UTF-8 for the representation of strings.  Therefore I suggest
that OpenEXR adopt the following rules:


- All text strings are to be interpreted as Unicode, encoded as UTF-8.
  This includes attribute names and strings contained in attributes,
  for example, as channel names.

- Text strings stored in files must be in Normalization Form C (NFC,
  canonical decomposition followed by canonical composition).

- Where text strings need to be collated, strcmp() is used to compare
  the corresponding char sequences:  string A comes before (or is less
  than) string B if

    strcmp(A,B) == -1

  (Note: this is not ambigous; the C99 standard specifies that strcmp()
  interprets the bytes that make up a string as unsigned.)

- Text strings passed to the IlmImf library must be encoded as UTF-8
  and in Normalization Form C.

As far as I can tell, these rules are entirely compatible with all
existing versions of the IlmImf library.  Users whose writing system
includes non-ASCII Unicode characters can continue to employ the
existing library versions without change.

Future versions of the library should verify that text strings are
valid UTF-8.  In addition, the library should either verify that
strings are normalized to NFC, or normalize to NFC on the fly.


Florian


_______________________________________________
Openexr-devel mailing list
Openexr-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/openexr-devel

Re: [Openexr-devel] UTF-8

Reply via email to