On Wed, 24 Sep 2014 13:03:57 -0400, Sven Van Caekenberghe <s...@stfx.eu> wrote:


Did you read the actual conversation in the issue ?

 
https://pharo.fogbugz.com/f/cases/14054/Issue-with-path-with-accented-characters

It has been renamed and there is a fix (as a change set, not as a slice, yet). Basically, there was a primitive call into a plugin that failed to do encoding.


No, I apologize; I missed the bug link.  Thanks for reposting it.

Now regarding the issues you raised. Pharo does not do Unicode canonicalisation or any of that other fancy stuff (like categorisation, proper ordering and so on). This is another orthogonal and way more general issue.

Regarding the pathnames encoding: if the OS itself does not know it, how can we ?

That's actually the argument *against* using UTF-8 as the standard Pharo way to represent filenames--at least on Unix systems. If Pharo used ByteArrays to represent paths, with convenience methods for working with UTF-8 (since I do agree that's the most likely thing for a user/dev to want), then you'd be able to work with all files no matter what, *and* have a convenient way of doing so for the common case.

This is an old discussion, and I do see both sides of it. In terms of SCMs, Mercurial and Git both just say "it's a collection of bytes", whereas Subversion says "it's Unicode code points." This has some uncomfortable implications for both systems when working on multiple platforms.

--Benjamin

Reply via email to