Le 24/09/2014 18:48, Benjamin Pollack a écrit : > On Tue, 23 Sep 2014 08:51:54 -0400, Hilaire <hila...@drgeo.eu> wrote: > >> Le 23/09/2014 14:09, Damien Cassou a écrit : >>> I recently read documents about utf-8 encoding. In all of them, the >>> author says that pathnames should be kept as is because you never know >>> which encoding the filesystem uses. So, a filename should probably be >>> a bytearray. >> >> >> yes, but a #é should be encoded in two bytes. > > As noted in my previous message, "é" could be represented as either > one or two Unicode code points, and these in turn could validly be > either two or three bytes in UTF-8. My gut says that $é should be > U+00E9, because otherwise you should have to use two Characters ($e > and $´), but you could legitimately argue otherwise as well, and at > any rate, #é could definitely be either. This is likely the core of > the issue you're hitting. As I understand it, #é should be encoded on two bytes and only two byte. Only ASCII is coded as 1 byte with UTF-8. See ref. on Wikipedia
Btw, I tracked the problem and provided a temporary fix. Hilaire -- Dr. Geo - http://drgeo.eu iStoa - http://istoa.drgeo.eu