On 9/25/19 7:58 PM, Georgios Petasis wrote:
Dear Massimo,
My advice is to use "encoding system" in your code, and act accordingly
in the code (use or not use encoding convertfrom).
This way, the code will work even in cases you cannot control the
settings apache runs with.
Best,
George
Hi George
As I hinted in my first message in this thread strings with accented
characters were handled consistently until they went through
::rivet::escape_string, before making into a URL.
The problem seems to be related to the byte string returned by this call
in ::rivet::escape_string
origString = Tcl_GetStringFromObj( objv[1], &origLength );
with both utf-8 and iso8859-1 system encodings the returned string is
invariably the utf-8 byte representation, which at first made sense to
me because I know that Tcl handles string as utf-8 internally. I'm not
questioning what Tcl_GetStringFromObj does but shouldn't at this point
be replaced by some function that returns a byte string consistent with
the locale?
For example the accented character 'è', which has code 0xe9 as byte
representation in latin1 (and the same code point in utf-8), is
represented as 0xc3 0xa9 (utf-8 byte string) and it becomes %c3%a9.
After this sequence of bytes has been unescaped it's returned by calling
Tcl_SetObjResult( interp, Tcl_NewStringObj( newString, -1 ) );
and the iso8859-1 machine represents it as é
I'm trying replacing Tcl_GetString... with Tcl_GetByteArrayFromObj (and
Tcl_NewByteArray). The sequence of the characters is correct but there
is some extra stuff in it that breaks things.
Still working (and wasting time) on it
-- Massimo
---------------------------------------------------------------------
To unsubscribe, e-mail: rivet-dev-unsubscr...@tcl.apache.org
For additional commands, e-mail: rivet-dev-h...@tcl.apache.org