*Which* multibyte?

This is the key point. The answer on a Unix-like system is "the one you get
when you do wctomb()". And that routine knows what to do by virtue of

You do know that wctomb() has nothing to do with Perl's Unicode implementation?


And that wctomb() is not defined to be Unicode?

And that Perl does not really know (in its core) about any other (beyond
8 bits) encodings than Unicode?

(Also, being controlled by locales is folly. Trust me on this.)

I also think you are reading too much into "use encoding 'foo'." It doesn't
mean "use 'foo' for *everything*". And it's misleading: internally Perl is
still using Unicode (encoded in UTF-8). Even if you say "use encoding 'sjis'"
and concat $a and $b by saying $a.$b Perl is not concatenating SJIS, but instead
UTF-8. (That's the same reason why your mkdir created an "UTF-8 filename".)


setlocale() (or the default if setlocale is never called). Who knows how the
file system stores the file name - it isn't important. But that's the way

Oh, but it is.


Imagine process A (let's say to use your model) having an 8-bit locale,
let's say some Hebrew locale, creating files to directory /foo/bar.

Then imagine process B having a UTF-8 locale, creating files to directory /foo/bar.

Then imagine process C having a UTF-16 locale, creating files to directory /foo/bar.

Along comes process D having a UTF-16 locale PLUS NFC normalization, trying to
make sense of the filenames from directory /foo/bar.


From the viewpoint of each process A-C everything worked just fine, and the
process D is happily expecting to find nice tasty _normalized_ UTF-16 filenames.
Boy is it going to be disappointed.


you interact with it. There may be Unix-style file systems out there that
work differently - please tell me which one.

I already told you: Apple HFS+ is pretty Unix-style, and it is certainly
used in a UNIX, and that uses normalized UTF-8. None of that wc/mb stuff.


 /* FindFirstFile() and stat() are buggy with a trailing
  * backslash, so change it to a forward slash :-( */
 case '\\':
     strncpy(buffer, path, l-1);
     buffer[l - 1] = '/';

That isn't the only piece of Perl code that plays "flip the backslashes" or
chops a final backslash - and all that code breaks Shift-JIS characters with
0x5C as a second byte.

Uhhh... from a Win32 API bug workaround you deduce that ... SJIS should work?


Regards,

I am sorry but I do not think what you want can possibly portably work --
and I think Perl is first and foremost a portability layer.
Sure, you can write modules (and possibly have fixes / workarounds)
to access platform-specific solutions, but that's what CPAN is for.


Sorry to be so negative on this but I did give it quite a bit of thought
during the Perl 5.8.0 process and I am pretty certain of my reasoning.

--
Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen





Reply via email to