On 9/25/19 7:58 PM, Georgios Petasis wrote:
Dear Massimo,

My advice is to use "encoding system" in your code, and act accordingly in the code (use or not use encoding convertfrom). This way, the code will work even in cases you cannot control the settings apache runs with.

Best,
George


Hi George

As I hinted in my first message in this thread strings with accented characters were handled consistently until they went through ::rivet::escape_string, before making into a URL.

The problem seems to be related to the byte string returned by this call in ::rivet::escape_string

origString = Tcl_GetStringFromObj( objv[1], &origLength );

with both utf-8 and iso8859-1 system encodings the returned string is invariably the utf-8 byte representation, which at first made sense to me because I know that Tcl handles string as utf-8 internally. I'm not questioning what Tcl_GetStringFromObj does but shouldn't at this point be replaced by some function that returns a byte string consistent with the locale?

For example the accented character 'è', which has code 0xe9 as byte representation in latin1 (and the same code point in utf-8), is represented as 0xc3 0xa9 (utf-8 byte string) and it becomes %c3%a9. After this sequence of bytes has been unescaped it's returned by calling

Tcl_SetObjResult( interp, Tcl_NewStringObj( newString, -1 ) );

and the iso8859-1 machine represents it as é

I'm trying replacing Tcl_GetString... with Tcl_GetByteArrayFromObj (and Tcl_NewByteArray). The sequence of the characters is correct but there is some extra stuff in it that breaks things.

Still working (and wasting time) on it


 -- Massimo





---------------------------------------------------------------------
To unsubscribe, e-mail: rivet-dev-unsubscr...@tcl.apache.org
For additional commands, e-mail: rivet-dev-h...@tcl.apache.org

Reply via email to