On 2013/01/06 7:21, Costello, Roger L. wrote:
Does this mean that when exchanging Unicode data across the Internet the
endianness is not relevant?
Are these stated correctly:
When Unicode data is in a file we would say, for example, "The file contains
UTF-32BE data."
When Unicode data is in memory we would say, "There is UTF-32 data in
memory."
When Unicode data is sent across the Internet we would say, "The UTF-32 data
was sent across the Internet."
The first is correct. The second is correct. The third is wrong. The
Internet deals with data as a series of bytes, and by its nature has to
pass data between big-endian and little-endian machines. Therefore,
endianness is very important on the Internet. So you would say:
"The UTF-32BE data was sent across the Internet."
Actually, as far as I'm aware of, the labels UTF-16BE and UTF-16LE were
first defined in the IETF, see
http://tools.ietf.org/html/rfc2781#appendix-A.1.
Because of this, Internet protocols mostly prefer UTF-8 over UTF-16 (or
UTF-32), and actual data is also heavily UTF-8. So it would be better to
say:
When Unicode data is sent across the Internet we would say, "The UTF-8
data was sent across the Internet."
Regards, Martin.