On Feb 13, 2009, at 4:38 AM, Bryan Jurish wrote: > moin all, > > On 2009-02-13 03:14:20, Hans-Christoph Steiner <[email protected]> > appears to > have written: >> On Thu, 12 Feb 2009, Bryan Jurish wrote: >>> Are we certain that Tk is actually translating at all, and not just >>> using some 8-bit default like latin-1 when it finds non-UTF-8 >>> input? I >>> ask because that's what Perl does by default, a behavior which >>> continues >>> to give me headaches. In Perl, each string has its own internal >>> "utf8" >>> flag which tells you whether Perl is currently thinking of that >>> string >>> as a raw byte-string in some unknown encoding or as a >>> "native" (utf8) >>> character string... I assume Tcl/Tk does something similar, but >>> don't >>> know how to test for this property there. >> >> Here's the doc that I read on this topic, but it probably doesn't >> have >> the lvel of detail that you require: >> >> http://tcl.tk/man/tcl8.5/TclCmd/fconfigure.htm#M8 > > Had a look at that last night, but the 'fconfigure' command only > applies > to Tcl streams (analagous to the PerlIO layer, which I abhore and > try my > best to avoid, as it doesn't provide a sufficient level of control for > most of my purposes... fconfigure be ok for Pd-devel if we say we're > dealing exclusively with utf-8... but then again, I don't know if Tcl > streams ("channels") are used at all by the GUI... maybe on the socket > to the backend, but that's probably it; IMHO it's safer to explicitly > generate byte strings in a known encoding and just pass those around). > > Also useful is the 'encoding' command family ('encoding convertfrom', > 'encoding convertto', 'encoding names', 'encoding system'). Tried > this > with some expicit escapes as well as a tester widget from > http://en.wikibooks.org/wiki/Tcl_Programming/Internationalization, > and I > get decent display (Japanese still doesn't display with any Tk fonts I > tried, but I think that's just a font problem). Also tested the bind > substitutions with a dummy "puts" script, and managed to get real > utf-8 > sent out over the stdout channel for keyboard input. Still not 100% > sure how well it's working, since my keyboard only produces latin-1 > symbols (maybe I'll hack my xmodmap for some real testing ;-) > > Unfortunately, I still haven't found a way to get Tcl to tell me what > encoding (if any) it thinks a given string is using, analagous to the > Perl predicate "utf8::is_utf8($string)". Maybe Tcl doesn't track this > information on a per-string level at all, but assumes [encoding > system] > for all strings? That seems pretty inflexible to me, but after > another > look at http://www.tcl.tk/man/tcl8.5/TclCmd/encoding.htm , it does > indeed seem to be the case. So I guess the only safe way to handle > things is (as you suggest) to select an internal encoding (e.g. UTF-8) > and enforce its use with {encoding system "utf-8"}, and possibly > {fconfigure $ch -encoding "utf-8"} for whatever channels we want. The > fconfigure manpage says the default channel encoding is [encoding > system]; but I suspect that perhaps it's really the value of [encoding > system] at the time of the channel's opening which has an effect, so > we > either have to make some accommodations for the standard channels > (stdin,stdout,stderr), or just leave that up to Tcl (which probably > defaults to the current locale's LC_CTYPE, but I haven't tested that > yet)... > >> As for Tk hacking for Pd, a big part of the pd-devel effort is to >> make >> the Tk GUI code readable, and even extendable! Feel free to hit me >> with >> questions, either here, or I am in #dataflow quite a bit these days. > > Groovy. I don't think I'll make the devel meeting today, but it's > beginning to look as if I've got a bit of a bug in my bonnet about > this ;-)
Hey, Its good to see someone iwlling to dive in deep. It'll be great to have full UTF-8 support. Patko and I were looking into how to do it on the C side, I think what you mentioned, using locale.h and setlocale() should be enough. Maybe patko will chime in with some details. .hc > > > marmosets, > Bryan > > -- > Bryan Jurish "There is *always* one more > bug." > [email protected] -Lubarsky's Law of Cybernetic > Entomology ---------------------------------------------------------------------------- Programs should be written for people to read, and only incidentally for machines to execute. - from Structure and Interpretation of Computer Programs _______________________________________________ [email protected] mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
