127 is 'delete' -- ascii all right, but not 'printable'. cheers Miller
On Tue, Jan 19, 2010 at 09:37:08PM -0500, Hans-Christoph Steiner wrote: > > Looks good to me. One comment, shouldn't this be n<128? 127 is an > ASCII char, AFAIK. > > + if (n == '\n' || (n > 31 && n < 127)) > > It looks worth checking to me, hopefully we can get Miller and others > to weigh in on it. > > .hc > > On Jan 19, 2010, at 4:16 PM, Bryan Jurish wrote: > > >morning all, > > > >attached is a UTF-8 support patch against branches/pd-gui-rewrite/0.43 > >revision 13051 (HEAD as of an hour or so ago). most of the bulk is > >new > >files (s_utf8.c, s_utf8.h), most other changes are in g_rtext.c. It's > >not too monstrous, and I've tested it again here briefly with some > >utf-8 > >test patches (see other attachment), and things appear to be working > >as > >expected. if desired, I can check this in; otherwise feel free to > >do it > >for me ;-) > > > >2 annoying things here during testing (I don't see how my patches > >could > >have caused this, but you never know): > > > >(1) all loaded patch windows appear at +0+0 (upper left corner), which > >with my wm (windowmaker) means the title bar is off the screen, and I > >have to resort to keyboard shortcuts to get them mouse-draggable, > >which > >is a major pain in the wazoo: is this a known bug? > > > >(2) I can't figure out how to get at the properties dialog for number, > >number2, or any other gui-atom objects: should these be working > >already? > > > >marmosets, > > Bryan > > > >On 2010-01-18 23:09:34, Hans-Christoph Steiner <[email protected]> > >appears to > >have written: > >> > >>Awesome! If its big and complicated, I say post it to the list > >>first, > >>if not too bad, then just commit. > >> > >>.hc > >> > >>On Jan 18, 2010, at 4:47 AM, Bryan Jurish wrote: > >> > >>>moin Hans, moin list, > >>> > >>>I think perhaps I never actually did post the cleaned-up patch > >>>anywhere > >>>(bad programmer, no biscuit); I guess I'll check out > >>>branches/pd-gui-rewrite/0.43 and try patching my changes in; then > >>>I can > >>>either commit or just post the (updated) patch. Hopefully no major > >>>additional changes will be required, so it ought to go pretty fast. > >>> > >>>marmosets, > >>> Bryan > >>> > >>>On 2010-01-17 22:57:33, Hans-Christoph Steiner <[email protected]> > >>>appears to > >>>have written: > >>>> > >>>>Hey Bryan, > >>>> > >>>>I'd like to try to get your UTF-8 code into pd-gui-rewrite. You > >>>>mention > >>>>in this posting back in May that you had the whole thing > >>>>working. I > >>>>couldn't find the diff/patch for this. Is it posted anywhere? > >>>>Do you > >>>>want to try to check it in yourself directly to the pd-gui- > >>>>rewrite/0.43 > >>>>branch? > >>>> > >>>>.hc > >>>> > >>>> > >>>>On Mar 20, 2009, at 6:16 PM, Bryan Jurish wrote: > >>>> > >>>>>morning all, > >>>>> > >>>>>Of course I never really like to see my code wither away in the > >>>>>bit > >>>>>bucket, but I personally don't have any pressing need for UTF-8 > >>>>>symbols, > >>>>>comments, etc. in Pd -- I'm a native English speaker, after > >>>>>all ;-) > >>>>> > >>>>>Also, my changes are by no means the only way to do it (or even > >>>>>the > >>>>>best > >>>>>way); we could gain a little speed by slapping on some more > >>>>>buffers > >>>>>(mostly and possibly only in rtext_senditup()), but since this > >>>>>seems to > >>>>>effect only GUI/editing stuff, I think we can live with a > >>>>>smidgeon of > >>>>>additional cpu time ... after all, it's all O(n) anyways. > >>>>> > >>>>>Really I just wanted to see how easy (or difficult) it would be > >>>>>to get > >>>>>Pd to use UTF-8 as its internal encoding... turned out to be > >>>>>harder > >>>>>than > >>>>>I had thought, but (ever so slightly) easier than I had feared :-/ > >>>>> > >>>>>marmosets, > >>>>> Bryan > >>>>> > >>>>>On 2009-03-20 18:39:06, Hans-Christoph Steiner <[email protected]> > >>>>>appears to > >>>>>have written: > >>>>>> > >>>>>>I wonder what the best approach is to getting it included. I > >>>>>>also > >>>>>>think > >>>>>>its a very valuable contribution. I think we need to first get > >>>>>>the > >>>>>>Tcl/Tk only changes done, since that was the mandate of the pd- > >>>>>>devel > >>>>>>0.41 effort. Then once Miller has accepted those changes, then > >>>>>>we can > >>>>>>start with the C modifications there. So how to proceed next, > >>>>>>I think > >>>>>>is based on how eager you are, Bryan, to getting this in a > >>>>>>regular > >>>>>>build. > >>>>>> > >>>>>>One option is making a pd-devel-utf8 branch, another is posting > >>>>>>these > >>>>>>patches to the patch tracker and waiting for Miller to make his > >>>>>>next > >>>>>>update with the Pd-devel Tcl-Tk code. > >>>>>> > >>>>>>Maybe we can get Miller to chime in on this topic. > >>>>>> > >>>>>>.hc > >>>>>> > >>>>>>On Mar 13, 2009, at 12:00 AM, dmotd wrote: > >>>>>> > >>>>>>>hey bryan, > >>>>>>> > >>>>>>>just a quick note of a appreciation for getting this one out.. > >>>>>>>i hope > >>>>>>>it gets > >>>>>>>picked up in millers build soon.. a very useful and necessary > >>>>>>>modification. > >>>>>>> > >>>>>>>well done! > >>>>>>> > >>>>>>>dmotd > >>>>>>> > >>>>>>>On Thursday 12 March 2009 08:07:50 Bryan Jurish wrote: > >>>>>>>>moin folks, > >>>>>>>> > >>>>>>>>I believe I've finally got pd-devel 0.41-4 using UTF-8 across > >>>>>>>>the > >>>>>>>>board. > >>>>>>>>So far, I've tested message boxes & comments (g_rtext), as > >>>>>>>>well as > >>>>>>>>symbol atoms, and all seems good. I think we can still expect > >>>>>>>>goofiness > >>>>>>>>if someone names an abstraction using a multibyte character > >>>>>>>>when the > >>>>>>>>filesystem isn't UTF-8 encoded (raw 8-bit works for me here > >>>>>>>>too), > >>>>>>>>but I > >>>>>>>>really don't want to open that particular can of worms. > >>>>>>>> > >>>>>>>>So I guess I have 2 questions: > >>>>>>>> > >>>>>>>>(1) what should I call the generic UTF-8 source files? (see > >>>>>>>>my other > >>>>>>>>post) > >>>>>>>> > >>>>>>>>(2) shall I commit these changes to pd-devel/0.41-4, or > >>>>>>>>somewhere > >>>>>>>>else, > >>>>>>>>or just post a diff (ca. 33k, ought to be easier to read now; > >>>>>>>>I've > >>>>>>>>tried > >>>>>>>>to follow the indentation conventions of the source files I > >>>>>>>>modified)? > >>>>>>>> > >>>>>>>>marmosets, > >>>>>>>> Bryan > >>>>> > >>>>>-- > >>>>>Bryan Jurish "There is *always* one more > >>>>>bug." > >>>>>[email protected] -Lubarsky's Law of Cybernetic > >>>>>Entomology > >>>> > >>>> > >>>> > >>>>---------------------------------------------------------------------------- > >>>> > >>>> > >>>> > >>>>The arc of history bends towards justice. - Dr. Martin Luther > >>>>King, Jr. > >>>> > >>>> > >>> > >>>-- > >>>*************************************************** > >>> > >>>Bryan Jurish > >>>Deutsches Textarchiv > >>>Berlin-Brandenburgische Akademie der Wissenschaften > >>> > >>>J?gerstr. 22/23 > >>>10117 Berlin > >>> > >>>Tel.: +49 (0)30 20370 539 > >>>E-Mail: [email protected] > >>> > >>>*************************************************** > >>> > >> > >> > >> > >>---------------------------------------------------------------------------- > >> > >> > >>As we enjoy great advantages from inventions of others, we should be > >>glad of an opportunity to serve others by any invention of ours; and > >>this we should do freely and generously. - Benjamin Franklin > >> > >> > >> > > > >-- > >Bryan Jurish "There is *always* one more bug." > >[email protected] -Lubarsky's Law of Cybernetic Entomology > >Index: src/Makefile.am > >=================================================================== > >--- src/Makefile.am (revision 13051) > >+++ src/Makefile.am (working copy) > >@@ -24,6 +24,7 @@ > > m_conf.c m_glob.c m_sched.c \ > > s_main.c s_inter.c s_file.c s_print.c \ > > s_loader.c s_path.c s_entry.c s_audio.c s_midi.c \ > >+ s_utf8.c \ > > d_ugen.c d_ctl.c d_arithmetic.c d_osc.c d_filter.c d_dac.c > >d_misc.c \ > > d_math.c d_fft.c d_array.c d_global.c \ > > d_delay.c d_resample.c \ > >Index: src/g_editor.c > >=================================================================== > >--- src/g_editor.c (revision 13051) > >+++ src/g_editor.c (working copy) > >@@ -9,6 +9,7 @@ > >#include "s_stuff.h" > >#include "g_canvas.h" > >#include <string.h> > >+#include "s_utf8.h" /*-- moo --*/ > > > >void glist_readfrombinbuf(t_glist *x, t_binbuf *b, char *filename, > > int selectem); > >@@ -1666,8 +1667,9 @@ > > gotkeysym = av[1].a_w.w_symbol; > > else if (av[1].a_type == A_FLOAT) > > { > >- char buf[3]; > >- sprintf(buf, "%c", (int)(av[1].a_w.w_float)); > >+ /*-- moo: assume keynum is a Unicode codepoint; encode as > >UTF-8 --*/ > >+ char buf[UTF8_MAXBYTES1]; > >+ u8_wc_toutf8_nul(buf, (UCS4)(av[1].a_w.w_float)); > > gotkeysym = gensym(buf); > > } > > else gotkeysym = gensym("?"); > >Index: src/s_utf8.c > >=================================================================== > >--- src/s_utf8.c (revision 0) > >+++ src/s_utf8.c (revision 0) > >@@ -0,0 +1,280 @@ > >+/* > >+ Basic UTF-8 manipulation routines > >+ by Jeff Bezanson > >+ placed in the public domain Fall 2005 > >+ > >+ This code is designed to provide the utilities you need to > >manipulate > >+ UTF-8 as an internal string encoding. These functions do not > >perform the > >+ error checking normally needed when handling UTF-8 data, so if > >you happen > >+ to be from the Unicode Consortium you will want to flay me alive. > >+ I do this because error checking can be performed at the > >boundaries (I/O), > >+ with these routines reserved for higher performance on data known > >to be > >+ valid. > >+ > >+ modified by Bryan Jurish (moo) March 2009 > >+ + removed some unneeded functions (escapes, printf etc), added > >others > >+*/ > >+#include <stdlib.h> > >+#include <stdio.h> > >+#include <string.h> > >+#include <stdarg.h> > >+#ifdef WIN32 > >+#include <malloc.h> > >+#else > >+#include <alloca.h> > >+#endif > >+ > >+#include "s_utf8.h" > >+ > >+static const u_int32_t offsetsFromUTF8[6] = { > >+ 0x00000000UL, 0x00003080UL, 0x000E2080UL, > >+ 0x03C82080UL, 0xFA082080UL, 0x82082080UL > >+}; > >+ > >+static const char trailingBytesForUTF8[256] = { > >+ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, > >+ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, > >+ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, > >+ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, > >+ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, > >+ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, > >+ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, > >+ 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, 3,3,3,3,3,3,3,3,4,4,4,4,5,5,5,5 > >+}; > >+ > >+ > >+/* returns length of next utf-8 sequence */ > >+int u8_seqlen(char *s) > >+{ > >+ return trailingBytesForUTF8[(unsigned int)(unsigned char)s[0]] > >+ 1; > >+} > >+ > >+/* conversions without error checking > >+ only works for valid UTF-8, i.e. no 5- or 6-byte sequences > >+ srcsz = source size in bytes, or -1 if 0-terminated > >+ sz = dest size in # of wide characters > >+ > >+ returns # characters converted > >+ dest will always be L'\0'-terminated, even if there isn't enough > >room > >+ for all the characters. > >+ if sz = srcsz+1 (i.e. 4*srcsz+4 bytes), there will always be > >enough space. > >+*/ > >+int u8_toucs(u_int32_t *dest, int sz, char *src, int srcsz) > >+{ > >+ u_int32_t ch; > >+ char *src_end = src + srcsz; > >+ int nb; > >+ int i=0; > >+ > >+ while (i < sz-1) { > >+ nb = trailingBytesForUTF8[(unsigned char)*src]; > >+ if (srcsz == -1) { > >+ if (*src == 0) > >+ goto done_toucs; > >+ } > >+ else { > >+ if (src + nb >= src_end) > >+ goto done_toucs; > >+ } > >+ ch = 0; > >+ switch (nb) { > >+ /* these fall through deliberately */ > >+#if UTF8_SUPPORT_FULL_UCS4 > >+ case 5: ch += (unsigned char)*src++; ch <<= 6; > >+ case 4: ch += (unsigned char)*src++; ch <<= 6; > >+#endif > >+ case 3: ch += (unsigned char)*src++; ch <<= 6; > >+ case 2: ch += (unsigned char)*src++; ch <<= 6; > >+ case 1: ch += (unsigned char)*src++; ch <<= 6; > >+ case 0: ch += (unsigned char)*src++; > >+ } > >+ ch -= offsetsFromUTF8[nb]; > >+ dest[i++] = ch; > >+ } > >+ done_toucs: > >+ dest[i] = 0; > >+ return i; > >+} > >+ > >+/* srcsz = number of source characters, or -1 if 0-terminated > >+ sz = size of dest buffer in bytes > >+ > >+ returns # characters converted > >+ dest will only be '\0'-terminated if there is enough space. this > >is > >+ for consistency; imagine there are 2 bytes of space left, but > >the next > >+ character requires 3 bytes. in this case we could NUL-terminate, > >but in > >+ general we can't when there's insufficient space. therefore this > >function > >+ only NUL-terminates if all the characters fit, and there's space > >for > >+ the NUL as well. > >+ the destination string will never be bigger than the source > >string. > >+*/ > >+int u8_toutf8(char *dest, int sz, u_int32_t *src, int srcsz) > >+{ > >+ u_int32_t ch; > >+ int i = 0; > >+ char *dest_end = dest + sz; > >+ > >+ while (srcsz<0 ? src[i]!=0 : i < srcsz) { > >+ ch = src[i]; > >+ if (ch < 0x80) { > >+ if (dest >= dest_end) > >+ return i; > >+ *dest++ = (char)ch; > >+ } > >+ else if (ch < 0x800) { > >+ if (dest >= dest_end-1) > >+ return i; > >+ *dest++ = (ch>>6) | 0xC0; > >+ *dest++ = (ch & 0x3F) | 0x80; > >+ } > >+ else if (ch < 0x10000) { > >+ if (dest >= dest_end-2) > >+ return i; > >+ *dest++ = (ch>>12) | 0xE0; > >+ *dest++ = ((ch>>6) & 0x3F) | 0x80; > >+ *dest++ = (ch & 0x3F) | 0x80; > >+ } > >+ else if (ch < 0x110000) { > >+ if (dest >= dest_end-3) > >+ return i; > >+ *dest++ = (ch>>18) | 0xF0; > >+ *dest++ = ((ch>>12) & 0x3F) | 0x80; > >+ *dest++ = ((ch>>6) & 0x3F) | 0x80; > >+ *dest++ = (ch & 0x3F) | 0x80; > >+ } > >+ i++; > >+ } > >+ if (dest < dest_end) > >+ *dest = '\0'; > >+ return i; > >+} > >+ > >+/* moo: get byte length of character number, or 0 if not supported */ > >+int u8_wc_nbytes(u_int32_t ch) > >+{ > >+ if (ch < 0x80) return 1; > >+ if (ch < 0x800) return 2; > >+ if (ch < 0x10000) return 3; > >+ if (ch < 0x200000) return 4; > >+#if UTF8_SUPPORT_FULL_UCS4 > >+ /*-- moo: support full UCS-4 range? --*/ > >+ if (ch < 0x4000000) return 5; > >+ if (ch < 0x7fffffffUL) return 6; > >+#endif > >+ return 0; /*-- bad input --*/ > >+} > >+ > >+int u8_wc_toutf8(char *dest, u_int32_t ch) > >+{ > >+ if (ch < 0x80) { > >+ dest[0] = (char)ch; > >+ return 1; > >+ } > >+ if (ch < 0x800) { > >+ dest[0] = (ch>>6) | 0xC0; > >+ dest[1] = (ch & 0x3F) | 0x80; > >+ return 2; > >+ } > >+ if (ch < 0x10000) { > >+ dest[0] = (ch>>12) | 0xE0; > >+ dest[1] = ((ch>>6) & 0x3F) | 0x80; > >+ dest[2] = (ch & 0x3F) | 0x80; > >+ return 3; > >+ } > >+ if (ch < 0x110000) { > >+ dest[0] = (ch>>18) | 0xF0; > >+ dest[1] = ((ch>>12) & 0x3F) | 0x80; > >+ dest[2] = ((ch>>6) & 0x3F) | 0x80; > >+ dest[3] = (ch & 0x3F) | 0x80; > >+ return 4; > >+ } > >+ return 0; > >+} > >+ > >+/*-- moo --*/ > >+int u8_wc_toutf8_nul(char *dest, u_int32_t ch) > >+{ > >+ int sz = u8_wc_toutf8(dest,ch); > >+ dest[sz] = '\0'; > >+ return sz; > >+} > >+ > >+/* charnum => byte offset */ > >+int u8_offset(char *str, int charnum) > >+{ > >+ int offs=0; > >+ > >+ while (charnum > 0 && str[offs]) { > >+ (void)(isutf(str[++offs]) || isutf(str[++offs]) || > >+ isutf(str[++offs]) || ++offs); > >+ charnum--; > >+ } > >+ return offs; > >+} > >+ > >+/* byte offset => charnum */ > >+int u8_charnum(char *s, int offset) > >+{ > >+ int charnum = 0, offs=0; > >+ > >+ while (offs < offset && s[offs]) { > >+ (void)(isutf(s[++offs]) || isutf(s[++offs]) || > >+ isutf(s[++offs]) || ++offs); > >+ charnum++; > >+ } > >+ return charnum; > >+} > >+ > >+/* reads the next utf-8 sequence out of a string, updating an index > >*/ > >+u_int32_t u8_nextchar(char *s, int *i) > >+{ > >+ u_int32_t ch = 0; > >+ int sz = 0; > >+ > >+ do { > >+ ch <<= 6; > >+ ch += (unsigned char)s[(*i)++]; > >+ sz++; > >+ } while (s[*i] && !isutf(s[*i])); > >+ ch -= offsetsFromUTF8[sz-1]; > >+ > >+ return ch; > >+} > >+ > >+/* number of characters */ > >+int u8_strlen(char *s) > >+{ > >+ int count = 0; > >+ int i = 0; > >+ > >+ while (u8_nextchar(s, &i) != 0) > >+ count++; > >+ > >+ return count; > >+} > >+ > >+void u8_inc(char *s, int *i) > >+{ > >+ (void)(isutf(s[++(*i)]) || isutf(s[++(*i)]) || > >+ isutf(s[++(*i)]) || ++(*i)); > >+} > >+ > >+void u8_dec(char *s, int *i) > >+{ > >+ (void)(isutf(s[--(*i)]) || isutf(s[--(*i)]) || > >+ isutf(s[--(*i)]) || --(*i)); > >+} > >+ > >+/*-- moo --*/ > >+void u8_inc_ptr(char **sp) > >+{ > >+ (void)(isutf(*(++(*sp))) || isutf(*(++(*sp))) || > >+ isutf(*(++(*sp))) || ++(*sp)); > >+} > >+ > >+/*-- moo --*/ > >+void u8_dec_ptr(char **sp) > >+{ > >+ (void)(isutf(*(--(*sp))) || isutf(*(--(*sp))) || > >+ isutf(*(--(*sp))) || --(*sp)); > >+} > >Index: src/g_rtext.c > >=================================================================== > >--- src/g_rtext.c (revision 13051) > >+++ src/g_rtext.c (working copy) > >@@ -13,6 +13,7 @@ > >#include "m_pd.h" > >#include "s_stuff.h" > >#include "g_canvas.h" > >+#include "s_utf8.h" > > > > > >#define LMARGIN 2 > >@@ -32,10 +33,10 @@ > > > >struct _rtext > >{ > >- char *x_buf; > >- int x_bufsize; > >- int x_selstart; > >- int x_selend; > >+ char *x_buf; /*-- raw byte string, assumed UTF-8 encoded > >(moo) --*/ > >+ int x_bufsize; /*-- byte length --*/ > >+ int x_selstart; /*-- byte offset --*/ > >+ int x_selend; /*-- byte offset --*/ > > int x_active; > > int x_dragfrom; > > int x_height; > >@@ -119,6 +120,15 @@ > > > >/* LATER deal with tcl-significant characters */ > > > >+/* firstone(), lastone() > >+ * + returns byte offset of (first|last) occurrence of 'c' in > >'s[0..n-1]', or > >+ * -1 if none was found > >+ * + 's' is a raw byte string > >+ * + 'c' is a byte value > >+ * + 'n' is the length (in bytes) of the prefix of 's' to be > >searched. > >+ * + we could make these functions work on logical characters in > >utf8 strings, > >+ * but we don't really need to... > >+ */ > >static int firstone(char *s, int c, int n) > >{ > > char *s2 = s + n; > >@@ -155,6 +165,16 @@ > > of the entire text in pixels. > > */ > > > >+ /*-- moo: > >+ * + some variables from the original version have been renamed > >+ * + variables with a "_b" suffix are raw byte strings, lengths, > >or offsets > >+ * + variables with a "_c" suffix are logical character lengths > >or offsets > >+ * (assuming valid UTF-8 encoded byte string in x->x_buf) > >+ * + a fair amount of O(n) computations required to convert > >between raw byte > >+ * offsets (needed by the C side) and logical character > >offsets (needed by > >+ * the GUI) > >+ */ > >+ > > /* LATER get this and sys_vgui to work together properly, > > breaking up messages as needed. As of now, there's > > a limit of 1950 characters, imposed by sys_vgui(). */ > >@@ -171,14 +191,16 @@ > >{ > > t_float dispx, dispy; > > char smallbuf[200], *tempbuf; > >- int outchars = 0, nlines = 0, ncolumns = 0, > >+ int outchars_b = 0, nlines = 0, ncolumns = 0, > > pixwide, pixhigh, font, fontwidth, fontheight, findx, findy; > > int reportedindex = 0; > > t_canvas *canvas = glist_getcanvas(x->x_glist); > >- int widthspec = x->x_text->te_width; > >- int widthlimit = (widthspec ? widthspec : BOXWIDTH); > >- int inindex = 0; > >- int selstart = 0, selend = 0; > >+ int widthspec_c = x->x_text->te_width; > >+ int widthlimit_c = (widthspec_c ? widthspec_c : BOXWIDTH); > >+ int inindex_b = 0; > >+ int inindex_c = 0; > >+ int selstart_b = 0, selend_b = 0; > >+ int x_bufsize_c = u8_charnum(x->x_buf, x->x_bufsize); > > /* if we're a GOP (the new, "goprect" style) borrow the font > >size > > from the inside to preserve the spacing */ > > if (pd_class(&x->x_text->te_pd) == canvas_class && > >@@ -193,65 +215,76 @@ > > if (x->x_bufsize >= 100) > > tempbuf = (char *)t_getbytes(2 * x->x_bufsize + 1); > > else tempbuf = smallbuf; > >- while (x->x_bufsize - inindex > 0) > >+ while (x_bufsize_c - inindex_c > 0) > > { > >- int inchars = x->x_bufsize - inindex; > >- int maxindex = (inchars > widthlimit ? widthlimit : inchars); > >+ int inchars_b = x->x_bufsize - inindex_b; > >+ int inchars_c = x_bufsize_c - inindex_c; > >+ int maxindex_c = (inchars_c > widthlimit_c ? widthlimit_c : > >inchars_c); > >+ int maxindex_b = u8_offset(x->x_buf + inindex_b, maxindex_c); > > int eatchar = 1; > >- int foundit = firstone(x->x_buf + inindex, '\n', maxindex); > >- if (foundit < 0) > >+ int foundit_b = firstone(x->x_buf + inindex_b, '\n', > >maxindex_b); > >+ int foundit_c; > >+ if (foundit_b < 0) > > { > >- if (inchars > widthlimit) > >+ if (inchars_c > widthlimit_c) > > { > >- foundit = lastone(x->x_buf + inindex, ' ', maxindex); > >- if (foundit < 0) > >+ foundit_b = lastone(x->x_buf + inindex_b, ' ', > >maxindex_b); > >+ if (foundit_b < 0) > > { > >- foundit = maxindex; > >+ foundit_b = maxindex_b; > >+ foundit_c = maxindex_c; > > eatchar = 0; > > } > >+ else > >+ foundit_c = u8_charnum(x->x_buf + inindex_b, > >foundit_b); > > } > > else > > { > >- foundit = inchars; > >+ foundit_b = inchars_b; > >+ foundit_c = inchars_c; > > eatchar = 0; > > } > > } > >+ else > >+ foundit_c = u8_charnum(x->x_buf + inindex_b, foundit_b); > >+ > > if (nlines == findy) > > { > > int actualx = (findx < 0 ? 0 : > >- (findx > foundit ? foundit : findx)); > >- *indexp = inindex + actualx; > >+ (findx > foundit_c ? foundit_c : findx)); > >+ *indexp = inindex_b + u8_offset(x->x_buf + inindex_b, > >actualx); > > reportedindex = 1; > > } > >- strncpy(tempbuf+outchars, x->x_buf + inindex, foundit); > >- if (x->x_selstart >= inindex && > >- x->x_selstart <= inindex + foundit + eatchar) > >- selstart = x->x_selstart + outchars - inindex; > >- if (x->x_selend >= inindex && > >- x->x_selend <= inindex + foundit + eatchar) > >- selend = x->x_selend + outchars - inindex; > >- outchars += foundit; > >- inindex += (foundit + eatchar); > >- if (inindex < x->x_bufsize) > >- tempbuf[outchars++] = '\n'; > >- if (foundit > ncolumns) > >- ncolumns = foundit; > >+ strncpy(tempbuf+outchars_b, x->x_buf + inindex_b, foundit_b); > >+ if (x->x_selstart >= inindex_b && > >+ x->x_selstart <= inindex_b + foundit_b + eatchar) > >+ selstart_b = x->x_selstart + outchars_b - inindex_b; > >+ if (x->x_selend >= inindex_b && > >+ x->x_selend <= inindex_b + foundit_b + eatchar) > >+ selend_b = x->x_selend + outchars_b - inindex_b; > >+ outchars_b += foundit_b; > >+ inindex_b += (foundit_b + eatchar); > >+ inindex_c += (foundit_c + eatchar); > >+ if (inindex_b < x->x_bufsize) > >+ tempbuf[outchars_b++] = '\n'; > >+ if (foundit_c > ncolumns) > >+ ncolumns = foundit_c; > > nlines++; > > } > > if (!reportedindex) > >- *indexp = outchars; > >+ *indexp = outchars_b; > > dispx = text_xpix(x->x_text, x->x_glist); > > dispy = text_ypix(x->x_text, x->x_glist); > > if (nlines < 1) nlines = 1; > >- if (!widthspec) > >+ if (!widthspec_c) > > { > > while (ncolumns < 3) > > { > >- tempbuf[outchars++] = ' '; > >+ tempbuf[outchars_b++] = ' '; > > ncolumns++; > > } > > } > >- else ncolumns = widthspec; > >+ else ncolumns = widthspec_c; > > pixwide = ncolumns * fontwidth + (LMARGIN + RMARGIN); > > pixhigh = nlines * fontheight + (TMARGIN + BMARGIN); > > > >@@ -259,31 +292,32 @@ > > sys_vgui("pdtk_text_new .x%lx.c {%s %s text} %f %f {%.*s} %d > >%s\n", > > canvas, x->x_tag, rtext_gettype(x)->s_name, > > dispx + LMARGIN, dispy + TMARGIN, > >- outchars, tempbuf, sys_hostfontsize(font), > >+ outchars_b, tempbuf, sys_hostfontsize(font), > > (glist_isselected(x->x_glist, > > &x->x_glist->gl_gobj)? "blue" : "black")); > > else if (action == SEND_UPDATE) > > { > > sys_vgui("pdtk_text_set .x%lx.c %s {%.*s}\n", > >- canvas, x->x_tag, outchars, tempbuf); > >+ canvas, x->x_tag, outchars_b, tempbuf); > > if (pixwide != x->x_drawnwidth || pixhigh != x->x_drawnheight) > > text_drawborder(x->x_text, x->x_glist, x->x_tag, > > pixwide, pixhigh, 0); > > if (x->x_active) > > { > >- if (selend > selstart) > >+ if (selend_b > selstart_b) > > { > > sys_vgui(".x%lx.c select from %s %d\n", canvas, > >- x->x_tag, selstart); > >+ x->x_tag, u8_charnum(x->x_buf, selstart_b)); > > sys_vgui(".x%lx.c select to %s %d\n", canvas, > >- x->x_tag, selend + (sys_oldtclversion ? 0 : -1)); > >+ x->x_tag, u8_charnum(x->x_buf, selend_b) > >+ + (sys_oldtclversion ? 0 : -1)); > > sys_vgui(".x%lx.c focus \"\"\n", canvas); > > } > > else > > { > > sys_vgui(".x%lx.c select clear\n", canvas); > > sys_vgui(".x%lx.c icursor %s %d\n", canvas, x->x_tag, > >- selstart); > >+ u8_charnum(x->x_buf, selstart_b)); > > sys_vgui(".x%lx.c focus %s\n", canvas, x->x_tag); > > } > > } > >@@ -448,12 +482,12 @@ > > .... > > } */ > > if (x->x_selstart && (x->x_selstart == x->x_selend)) > >- x->x_selstart--; > >+ u8_dec(x->x_buf, &x->x_selstart); > > } > > else if (n == 127) /* delete */ > > { > > if (x->x_selend < x->x_bufsize && (x->x_selstart == x- > >>x_selend)) > >- x->x_selend++; > >+ u8_inc(x->x_buf, &x->x_selend); > > } > > > > ndel = x->x_selend - x->x_selstart; > >@@ -466,7 +500,13 @@ > >/* at Guenter's suggestion, use 'n>31' to test wither a character > >might > >be printable in whatever 8-bit character set we find ourselves. */ > > > >- if (n == '\n' || (n > 31 && n != 127)) > >+/*-- moo: > >+ ... but test with "<" rather than "!=" in order to accomodate > >unicode > >+ codepoints for n (which we get since Tk is sending the "%A" > >substitution > >+ for bind <Key>), effectively reducing the coverage of this clause > >to 7 > >+ bits. Case n>127 is covered by the next clause. > >+*/ > >+ if (n == '\n' || (n > 31 && n < 127)) > > { > > newsize = x->x_bufsize+1; > > x->x_buf = resizebytes(x->x_buf, x->x_bufsize, newsize); > >@@ -476,20 +516,39 @@ > > x->x_bufsize = newsize; > > x->x_selstart = x->x_selstart + 1; > > } > >+ /*--moo: check for unicode codepoints beyond 7-bit ASCII --*/ > >+ else if (n > 127) > >+ { > >+ int ch_nbytes = u8_wc_nbytes(n); > >+ newsize = x->x_bufsize + ch_nbytes; > >+ x->x_buf = resizebytes(x->x_buf, x->x_bufsize, newsize); > >+ for (i = x->x_bufsize; i > x->x_selstart; i--) > >+ x->x_buf[i] = x->x_buf[i-1]; > >+ x->x_bufsize = newsize; > >+ /*-- moo: assume canvas_key() has encoded keysym as > >UTF-8 */ > >+ strncpy(x->x_buf+x->x_selstart, keysym->s_name, > >ch_nbytes); > >+ x->x_selstart = x->x_selstart + ch_nbytes; > >+ } > > x->x_selend = x->x_selstart; > > x->x_glist->gl_editor->e_textdirty = 1; > > } > > else if (!strcmp(keysym->s_name, "Right")) > > { > > if (x->x_selend == x->x_selstart && x->x_selstart < x- > >>x_bufsize) > >- x->x_selend = x->x_selstart = x->x_selstart + 1; > >+ { > >+ u8_inc(x->x_buf, &x->x_selstart); > >+ x->x_selend = x->x_selstart; > >+ } > > else > > x->x_selstart = x->x_selend; > > } > > else if (!strcmp(keysym->s_name, "Left")) > > { > > if (x->x_selend == x->x_selstart && x->x_selstart > 0) > >- x->x_selend = x->x_selstart = x->x_selstart - 1; > >+ { > >+ u8_dec(x->x_buf, &x->x_selstart); > >+ x->x_selend = x->x_selstart; > >+ } > > else > > x->x_selend = x->x_selstart; > > } > >@@ -497,18 +556,18 @@ > > else if (!strcmp(keysym->s_name, "Up")) > > { > > if (x->x_selstart) > >- x->x_selstart--; > >+ u8_dec(x->x_buf, &x->x_selstart); > > while (x->x_selstart > 0 && x->x_buf[x->x_selstart] != '\n') > >- x->x_selstart--; > >+ u8_dec(x->x_buf, &x->x_selstart); > > x->x_selend = x->x_selstart; > > } > > else if (!strcmp(keysym->s_name, "Down")) > > { > > while (x->x_selend < x->x_bufsize && > > x->x_buf[x->x_selend] != '\n') > >- x->x_selend++; > >+ u8_inc(x->x_buf, &x->x_selend); > > if (x->x_selend < x->x_bufsize) > >- x->x_selend++; > >+ u8_inc(x->x_buf, &x->x_selend); > > x->x_selstart = x->x_selend; > > } > > rtext_senditup(x, SEND_UPDATE, &w, &h, &indx); > >Index: src/s_utf8.h > >=================================================================== > >--- src/s_utf8.h (revision 0) > >+++ src/s_utf8.h (revision 0) > >@@ -0,0 +1,88 @@ > >+#ifndef S_UTF8_H > >+#define S_UTF8_H > >+ > >+/*--moo--*/ > >+#ifndef u_int32_t > >+# define u_int32_t unsigned int > >+#endif > >+ > >+#ifndef UCS4 > >+# define UCS4 u_int32_t > >+#endif > >+ > >+/* UTF8_SUPPORT_FULL_UCS4 > >+ * define this to support the full potential range of UCS-4 > >codepoints > >+ * (in anticipation of a future UTF-8 standard) > >+ */ > >+/*#define UTF8_SUPPORT_FULL_UCS4 1*/ > >+#undef UTF8_SUPPORT_FULL_UCS4 > >+ > >+/* UTF8_MAXBYTES > >+ * maximum number of bytes required to represent a single > >character in UTF-8 > >+ * > >+ * UTF8_MAXBYTES1 = UTF8_MAXBYTES+1 > >+ * maximum bytes per character including NUL terminator > >+ */ > >+#ifdef UTF8_SUPPORT_FULL_UCS4 > >+# ifndef UTF8_MAXBYTES > >+# define UTF8_MAXBYTES 6 > >+# endif > >+# ifndef UTF8_MAXBYTES1 > >+# define UTF8_MAXBYTES1 7 > >+# endif > >+#else > >+# ifndef UTF8_MAXBYTES > >+# define UTF8_MAXBYTES 4 > >+# endif > >+# ifndef UTF8_MAXBYTES1 > >+# define UTF8_MAXBYTES1 5 > >+# endif > >+#endif > >+/*--/moo--*/ > >+ > >+/* is c the start of a utf8 sequence? */ > >+#define isutf(c) (((c)&0xC0)!=0x80) > >+ > >+/* convert UTF-8 data to wide character */ > >+int u8_toucs(u_int32_t *dest, int sz, char *src, int srcsz); > >+ > >+/* the opposite conversion */ > >+int u8_toutf8(char *dest, int sz, u_int32_t *src, int srcsz); > >+ > >+/* moo: get byte length of character number, or 0 if not supported */ > >+int u8_wc_nbytes(u_int32_t ch); > >+ > >+/* moo: compute required storage for UTF-8 encoding of 's[0..n-1]' */ > >+int u8_wcs_nbytes(u_int32_t *ucs, int size); > >+ > >+/* single character to UTF-8, no NUL termination */ > >+int u8_wc_toutf8(char *dest, u_int32_t ch); > >+ > >+/* moo: single character to UTF-8, with NUL termination */ > >+int u8_wc_toutf8_nul(char *dest, u_int32_t ch); > >+ > >+/* character number to byte offset */ > >+int u8_offset(char *str, int charnum); > >+ > >+/* byte offset to character number */ > >+int u8_charnum(char *s, int offset); > >+ > >+/* return next character, updating an index variable */ > >+u_int32_t u8_nextchar(char *s, int *i); > >+ > >+/* move to next character */ > >+void u8_inc(char *s, int *i); > >+ > >+/* move to previous character */ > >+void u8_dec(char *s, int *i); > >+ > >+/* moo: move pointer to next character */ > >+void u8_inc_ptr(char **sp); > >+ > >+/* moo: move pointer to previous character */ > >+void u8_dec_ptr(char **sp); > >+ > >+/* returns length of next utf-8 sequence */ > >+int u8_seqlen(char *s); > >+ > >+#endif /* S_UTF8_H */ > ><test-utf8.pd> > > > > > > ---------------------------------------------------------------------------- > > "[T]he greatest purveyor of violence in the world today [is] my own > government." - Martin Luther King, Jr. > > > > > _______________________________________________ > Pd-dev mailing list > [email protected] > http://lists.puredata.info/listinfo/pd-dev _______________________________________________ Pd-dev mailing list [email protected] http://lists.puredata.info/listinfo/pd-dev
