Ienup Sung wrote:
>
> Well, on the contrary to what you said, it is a very good option since you
> don't have to know anything about what's inside the character bytes which
> means by using the mblen/mbrlen, you can achieve codeset independent
> programming that will support not only Unicode/UTF-8 but also any other major
> codesets in the world.
This assumes that the C standard library the final user is really using,
does really support UTF-8. My current impression is that, unfortunately,
I cannot rely on that assumption, unless (as you note) I require full
Unicode conformance *on part of the underlying platform*. Which is
on practical matters still a heavy requirements these days, at least for me.
> Also, what I meant by is mbrlen/mblen kind of interfaces; of course if
> you want to deal with stateful encoding then you obviously need to use
> mbrlen() that are rather recently added at ISO C MSE/XPG5.
Published in Spring 1995 (I speak about ISO C).
I know it has been added "rather recently" in real world
implementations though... And I have some ideas about the
underlying reasons.
> Your argument on mblen doesn't work for BIG5 as a living proof, all
> Unix systems that have BIG5 locale work fine and perfectly with/at
> mblen/mbrlen with the BIG5 locale.
Sorry, I wasn't clear. My idea was that you cannot call mblen() with an
arbitrary pointer, the result would be meaningless: you need to be sure
this is a lead byte or a single-byte character before.
OTOH, with UTF-16, you got meaningful results.
And yes, this is a very minor point.
> Therefore, I argue your argument on mblen and such not working with
> BIG5 and ISO-2022-JP not true and mis-leading.
I never say nor imply that they are not working. I cannot understand what
sentence of mine may have lead to that conclusion. I just said they are not
currently *working* with UTF-8 inputs, which is quite different.
I also said that mblen on any DBCS encodings (and _this_ includes Big-5
or ISO-2022-JP) is more clumsy than the equivalent on UTF-16. Which
is a matter of taste I do not want to discuss any further, since as Michka
rightly points out any discussion about religious matters are useless.
So you can consider you are right.
Antoine