Re: g_filename_to_uri() issue in glib-win32

2012-05-23 Thread David Nečas
On Wed, May 23, 2012 at 06:48:31AM +0100, John Emmas wrote: But whatever that (second) character looked like, it's decimal value would always be 246 (because the UTF-8 sequence C3 B6 translates to decimal 246). The URI translation of decimal 246 is %F6. This is nonsense. Percent-encoding

Re: g_filename_to_uri() issue in glib-win32

2012-05-23 Thread Jürg Billeter
On Wed, 2012-05-23 at 06:48 +0100, John Emmas wrote: But whatever that (second) character looked like, it's decimal value would always be 246 (because the UTF-8 sequence C3 B6 translates to decimal 246). The URI translation of decimal 246 is %F6. U+00F6 is the Unicode codepoint but URI

Re: g_filename_to_uri() issue in glib-win32

2012-05-23 Thread John Emmas
On 23 May 2012, at 08:40, Jürg Billeter wrote: U+00F6 is the Unicode codepoint but URI percent encoding never directly uses codepoints as you can encode only a single byte at a time and the range of Unicode codepoints is much larger than that (up to U+10). As Krzysztof already wrote,

Re: g_filename_to_uri() issue in glib-win32

2012-05-23 Thread John Emmas
On 23 May 2012, at 10:05, John Emmas wrote: Still a bit confused really... :-( Not any more My confusion arose from the fact that the notes for g_filename_to_uri() (i.e. the note inside gconvert.c) states that its based on the requirements of RFC 2396:-

g_filename_to_uri() issue in glib-win32

2012-05-22 Thread John Emmas
I'm using the Glib function g_filename_to_uri() in glib-win32 (version 2.24). According to the documentation I should pass in a file path in the encoding format used by Glib (which on Windows is UTF-8). However, if I pass in a UTF-8 string, this function translates it character-by-character

Re: g_filename_to_uri() issue in glib-win32

2012-05-22 Thread Krzysztof Kosiński
2012/5/22 John Emmas john...@tiscali.co.uk: So for example, if the input string is Göran (encoded as UTF-8) I get the wrong output (hopefully, you can see that the 'o' has an umlaut).   g_filename_to_uri encodes 6 characters and returns G%C3%B6ran instead of encoding just 5 characters to

Re: g_filename_to_uri() issue in glib-win32

2012-05-22 Thread John Emmas
On 23 May 2012, at 00:22, Krzysztof Kosiński wrote: What you get is an URI encoding of the UTF-8 bytes. I think this is the expected and correct behavior: there are multiple incompatible locale encodings and there's no way for this function to know what encoding you want to use for the URI.