Re: Counting characters or bytes in UTF-8?

Antoine Leca Tue, 12 Sep 2000 01:43:23 -0700

Yves Arrouye wrote:
> 
> > 2. The original intent of strncpy() was to provide a means of copying both
> > bytes and characters. Since the assumption was 1 byte == 1 char, there was
> > no problem with this. In addition to the problem in #1, though, UTF-8
> > introduces these issues:
> 
> I've always looked at the strxxx() functions as manipulating characters
> (strings of), and the memxxx() ones (memcpy, memcmp, actually bxxx() in my
> time) as manipulating bytes.

Unfortunately, the C Standard legislated it the other way round:
the different count values in both the memxxx() *and* the strnxxx()
functions are clearly specified as byte count, and not (multibyte)
characters.

As far as I know, all implementations with more-than-1-byte characters,
that is practically East Asian ones and the European ones for the
Videotext codesets and related T.51/T.61, take the short and easy way
and use byte counts (some invented special supplementary functions to
deal with multibyte character counts, for example dealing with "widths").

Antoine

Re: Counting characters or bytes in UTF-8?

Reply via email to