On Sun, 21 Nov 2010 10:17:57 -0800, Raymond Hettinger <raymond.hettin...@gmail.com> wrote: > On Nov 21, 2010, at 9:38 AM, R. David Murray wrote: > > I'm sorry, but I have to disagree. As a relative unicode ignoramus, > > "UCS-2" and "UCS-4" convey almost no information to me, and the bits I > > have heard about them on this list have only confused me.
[...] > 6rom a users point-of-view, the actual encoding or encoding name > doesn't matter much. They just need to be able to predict the relevant > behaviors (memory consumption and len/slicing behavior). > > For the narrow build, that behavior is: > - Characters in the BMP consume 2 bytes and count as one char > for purposes of len and slicing. > - Characters above the BMP consume 4 bytes and counts as > two distinct chars for purpose of len and slicing. > > For wide builds, all characters are 4 bytes and count as a single > char for len and slicing. > > Hope this helps, Thank you, that nicely summarizes and confirms what I thought I knew about wide versus narrow build. And as I said, using the names UCS-2/UCS-4 would only *confuse* that understanding, not clarify it. -- R. David Murray www.bitdance.com _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com