Le 24/09/2014 18:48, Benjamin Pollack a écrit :
> On Tue, 23 Sep 2014 08:51:54 -0400, Hilaire <hila...@drgeo.eu> wrote:
>
>> Le 23/09/2014 14:09, Damien Cassou a écrit :
>>> I recently read documents about utf-8 encoding. In all of them, the
>>> author says that pathnames should be kept as is because you never know
>>> which encoding the filesystem uses. So, a filename should probably be
>>> a bytearray.
>>
>>
>> yes, but a #é should be encoded in two bytes.
>
> As noted in my previous message, "é" could be represented as either
> one or two Unicode code points, and these in turn could validly be
> either two or three bytes in UTF-8.  My gut says that $é should be
> U+00E9, because otherwise you should have to use two Characters ($e
> and $´), but you could legitimately argue otherwise as well, and at
> any rate, #é could definitely be either.  This is likely the core of
> the issue you're hitting.
As I understand it, #é should be encoded on two bytes and only two byte.
Only ASCII is coded as 1 byte with UTF-8.
See ref. on Wikipedia

Btw, I tracked the problem and provided a temporary fix.


Hilaire

-- 
Dr. Geo - http://drgeo.eu
iStoa - http://istoa.drgeo.eu


Reply via email to