Hi Alan,

I think you misunderstand the issue. The problem is that a file
consists of bytes. In the old days, each byte corresponded to a
single character, but with the advent of UTF-8 and the like a single
character may be represented by one, two or more bytes. What a program
will do with these bytes depends on the assumption about the
character encoding.

For Tcl programs the following happens:
- Based on the system encoding, all sequences of bytes are translated
   into equivalent UTF-8 characters.
- If the system encoding is NOT UTF-8, the internal resulting sequence
   may not be the same as in the file. For instance, on Windows "cp1252"
   is one way to connect the bytes above 127 to characters such as
   A-umlaut. So a byte that represents A-umlaut according to the cp1252
   encoding is translated to the UTF-8 sequence of bytes that represents
   that very same character. In other words: it is a completely different
   sequence of bytes.
- Right now we pass that _internal_ sequence of bytes to the PLplot C
   library - and assume that it was the original sequence of bytes.
   But that is only true if the system encoding is UTF-8.
   The code I propose as an alternative reverses the translation.

Bytes lower/equal 127 represent exactly the same charachters in cp1252
and UTF-8 (by design), so most examples are not affected by this
distinction.

(I agree this is highly confusing - but if you simply think of
bytes separated from characters it becomes a bit easier)

Regards,

Arjen
On 2010-12-21 10:55, Alan W. Irwin wrote:
> On 2010-12-21 09:25+0100 Arjen Markus wrote:
> 
>> I solved it, based on a suggestion from a fellow Tcler: we need to
>> pass the string to a Tcl_UtfToExternal*() function. The problem on
>> Windows is that the system encoding is _not_ UTF-8, but cp1252.
>> The strings in the source are therefore translated from cp1252 to UTF-8,
>> so that the resulting string represents these characters. Using said
>> function undoes this translation: the PLplot routines receive the
>> original sequence of bytes.
> 
> I think this explanation must be incomplete or incorrect. The reason I
> say this is our Tcl example files, e.g., x24.tcl, (and our C code as
> well, for that matter) are all encoded directly in UTF-8. So when I
> use the emacs editor to look at x24.tcl, the "Peace" words below
> 
> set peace {
> 
> are displayed correctly for every language.  Apparently it would be
> impossible to encode those peace words as cp1252.  That encoding is
> quite limited and very close to ISO-8559-1 according to the Wikipedia
> article about it, and thus would not be able to represent, e.g., the
> Mandarin, Arabic, etc. words for Peace that I see when editing that
> file with emacs.
> 
> On Linux, when the pltcl executable interprets those UTF-8 encoded source
> files, it does everything as expected and passes those strings directly
> as UTF-8 to the PLplot core library.  (I have just double-checked
> that by running "../../utils/pltcl x24 -dev pngcairo -o test.png" in the 
> build-tree
> examples/tcl subdirectory on Linux, and the result was a perfect
> Peace flag just like we get on Linux with the
> x24c executable that is written
> in C (as can be seen by looking at, e.g.,
> http://plplot.sourceforge.net/examples-data/demo24/x24.01.png).
> 
> Arjen, what happens for the x24c executable on Windows for a modern
> device such as pngcairo?  (My wine fonts are quite limited so I just
> get blanks for the exotic languages so it is not a definitive test). If 
> your x24c results with pngcairo are similar to the above result
> displayed on our web site, then all is well in Windows C, and the only
> remaining question is what is going on in Tcl for Windows that is so
> different from C for Windows in how it deals with source files that
> are encoded in UTF-8.
> 
> Alan
> __________________________
> Alan W. Irwin
> 
> Astronomical research affiliation with Department of Physics and Astronomy,
> University of Victoria (astrowww.phys.uvic.ca).
> 
> Programming affiliations with the FreeEOS equation-of-state implementation
> for stellar interiors (freeeos.sf.net); PLplot scientific plotting software
> package (plplot.org); the libLASi project (unifont.org/lasi); the Loads of
> Linux Links project (loll.sf.net); and the Linux Brochure Project
> (lbproject.sf.net).
> __________________________
> 
> Linux-powered Science
> __________________________
> 
 

DISCLAIMER: This message is intended exclusively for the addressee(s) and may 
contain confidential and privileged information. If you are not the intended 
recipient please notify the sender immediately and destroy this message. 
Unauthorized use, disclosure or copying of this message is strictly prohibited.
The foundation 'Stichting Deltares', which has its seat at Delft, The 
Netherlands, Commercial Registration Number 41146461, is not liable in any way 
whatsoever for consequences and/or damages resulting from the improper, 
incomplete and untimely dispatch, receipt and/or content of this e-mail.





------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Plplot-devel mailing list
Plplot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/plplot-devel

Reply via email to