On 24 Sep 2014, at 18:48, Benjamin Pollack <benja...@bitquabit.com> wrote:

> On Tue, 23 Sep 2014 08:51:54 -0400, Hilaire <hila...@drgeo.eu> wrote:
> 
>> Le 23/09/2014 14:09, Damien Cassou a écrit :
>>> I recently read documents about utf-8 encoding. In all of them, the
>>> author says that pathnames should be kept as is because you never know
>>> which encoding the filesystem uses. So, a filename should probably be
>>> a bytearray.
>> 
>> 
>> yes, but a #é should be encoded in two bytes.
> 
> As noted in my previous message, "é" could be represented as either one or 
> two Unicode code points, and these in turn could validly be either two or 
> three bytes in UTF-8.  My gut says that $é should be U+00E9, because 
> otherwise you should have to use two Characters ($e and $´), but you could 
> legitimately argue otherwise as well, and at any rate, #é could definitely be 
> either.  This is likely the core of the issue you're hitting.

Did you read the actual conversation in the issue ?

 
https://pharo.fogbugz.com/f/cases/14054/Issue-with-path-with-accented-characters

It has been renamed and there is a fix (as a change set, not as a slice, yet). 
Basically, there was a primitive call into a plugin that failed to do encoding.

Now regarding the issues you raised. Pharo does not do Unicode canonicalisation 
or any of that other fancy stuff (like categorisation, proper ordering and so 
on). This is another orthogonal and way more general issue.

Regarding the pathnames encoding: if the OS itself does not know it, how can we 
? I think that the current approach (assuming UTF-8) makes (the most) sense for 
a system that runs on multiple platforms.

Sven


Reply via email to