On Wed, 24 Sep 2014 13:03:57 -0400, Sven Van Caekenberghe <s...@stfx.eu>
wrote:
Did you read the actual conversation in the issue ?
https://pharo.fogbugz.com/f/cases/14054/Issue-with-path-with-accented-characters
It has been renamed and there is a fix (as a change set, not as a slice,
yet). Basically, there was a primitive call into a plugin that failed to
do encoding.
No, I apologize; I missed the bug link. Thanks for reposting it.
Now regarding the issues you raised. Pharo does not do Unicode
canonicalisation or any of that other fancy stuff (like categorisation,
proper ordering and so on). This is another orthogonal and way more
general issue.
Regarding the pathnames encoding: if the OS itself does not know it, how
can we ?
That's actually the argument *against* using UTF-8 as the standard Pharo
way to represent filenames--at least on Unix systems. If Pharo used
ByteArrays to represent paths, with convenience methods for working with
UTF-8 (since I do agree that's the most likely thing for a user/dev to
want), then you'd be able to work with all files no matter what, *and*
have a convenient way of doing so for the common case.
This is an old discussion, and I do see both sides of it. In terms of
SCMs, Mercurial and Git both just say "it's a collection of bytes",
whereas Subversion says "it's Unicode code points." This has some
uncomfortable implications for both systems when working on multiple
platforms.
--Benjamin