Marcin 'Qrczak' Kowalczyk wrote:
> Yes, IMHO all general-purpose languages should support processing
> arrays of bytes, in addition to Unicode strings.
C is likely to retain the behavior of the str functions. Although, it puts a lot of burden on the developers to identify all opaque strings and really handle them with those functions throughout the application (or even worse, a suite of applications not neccessarily written by the same company).
Newer languages are probably often designed with an assumption that all you need is a good class for Unicode strings. Instead of making them change that assumption, we could consider finding a way to make that true.
If a solution that doesn't break anything in Unicode cannot be found, then consider a solution that does break something, but check what the part that is broken really affects. For example, we assume it MUST be possible to represent a valid Unicode string in any UTF stream and get it back. Suppose you find a solution that retains that capability for all Unicode codepoints except for 128. If you know that those will ONLY be used for a particular purpose, you might be willing to accept that those who use those codepoints will deal with the problem and for those who don't the rules didn't really change. What I am saying is that we need to preserve the intention of the existing rules, not the rules themselves.
But again, this is if I was proposing that everybody starts using my conversion everywhere. Which at this point I am not.
>
> It's not clear however how the API of filenames should look like,
> especially if they wish to be portable to Windows.
I intend to bring up the issue in near future. And try to let everyone catch some breath before that.
> or delimit the filename with "\0", or prefix it with
> the length, or something like this.
I don't see why that would be necessary or useful.
> A backup software should do this
> and not pay attention to the locale. But for end-user software like
> an image viewer, processing arbitrary filenames is less important.
You have to pay attention to the locale eventually. You need to report which file failed to be backed up (or is infected with a virus). And you should be able to let the user restore a single file. If you don't interpret it according to the locale (possibly UTF-8), user won't know how to select what she wants. Even worse if one wants to enter the filename manually. All this CAN be done within the application, but is very cumbersome. It gets worse if you want to pass some information to another software, since the other application may not have an interface to accept the opaque strings. If it does, the convention may differ. This is why I am saying that something should be standardized. Of course standardizing a poor solution is not a good idea. We should do our best to find a good one.
> Technically they are binary (command line arguments must not contain
> zero bytes). Users are expecting stdin and stdout to be treated as
> text or binary depending on the program, while command like arguments
> are generally interpreted as text or filenames.
So, an application outputting filenames has a binary stdout and no text application is guaranteed to process this output.
Lars

