On Wed, May 23, 2012 at 06:48:31AM +0100, John Emmas wrote:
But whatever that (second) character looked like, it's decimal value would
always be 246 (because the UTF-8 sequence C3 B6 translates to decimal 246).
The URI translation of decimal 246 is %F6.
This is nonsense. Percent-encoding
On Wed, 2012-05-23 at 06:48 +0100, John Emmas wrote:
But whatever that (second) character looked like, it's decimal value
would always be 246 (because the UTF-8 sequence C3 B6 translates to
decimal 246).
The URI translation of decimal 246 is %F6.
U+00F6 is the Unicode codepoint but URI
On 23 May 2012, at 08:40, Jürg Billeter wrote:
U+00F6 is the Unicode codepoint but URI percent encoding never directly
uses codepoints as you can encode only a single byte at a time and the
range of Unicode codepoints is much larger than that (up to U+10).
As Krzysztof already wrote,
On 23 May 2012, at 10:05, John Emmas wrote:
Still a bit confused really... :-(
Not any more
My confusion arose from the fact that the notes for g_filename_to_uri() (i.e.
the note inside gconvert.c) states that its based on the requirements of RFC
2396:-
I'm using the Glib function g_filename_to_uri() in glib-win32 (version 2.24).
According to the documentation I should pass in a file path in the encoding
format used by Glib (which on Windows is UTF-8). However, if I pass in a UTF-8
string, this function translates it character-by-character
2012/5/22 John Emmas john...@tiscali.co.uk:
So for example, if the input string is Göran (encoded as UTF-8) I get the
wrong output (hopefully, you can see that the 'o' has an umlaut).
g_filename_to_uri encodes 6 characters and returns G%C3%B6ran instead of
encoding just 5 characters to
On 23 May 2012, at 00:22, Krzysztof Kosiński wrote:
What you get is an URI encoding of the UTF-8 bytes. I think this is
the expected and correct behavior: there are multiple incompatible
locale encodings and there's no way for this function to know what
encoding you want to use for the URI.