hi, 
Just tryed adding chinese parsing ability to html specialchars, 
If you do any webdev in php in chinese, it is one of the annoying 'missing
features'


Can somebody help finish this off, There are 2 issues,
- I get compile time saying 
   html.c:76: warning: comparison is always false due to limited range of data
type
  (I'm not a C expert at all, is a (char *) expect to return < 127 ??)...


- I have no idea what type of wrapper check to check for character set or
locale 

Anyway heres the patch, If someone wants to comment on.......

Index: html.c
===================================================================
RCS file: /repository/php4/ext/standard/html.c,v
retrieving revision 1.22
diff -u -r1.22 html.c
--- html.c      2000/11/24 16:17:58     1.22
+++ html.c      2001/02/21 06:57:30
@@ -51,8 +51,8 @@
 
 PHPAPI char *php_escape_html_entities(unsigned char *old, int oldlen, int
*newlen, int all, int quote_style)
 {
-       int i, maxlen, len;
-       char *new;
+       int i, maxlen, len, ischinese;
+       char *new, *oldnext, *oldprev;
 
        maxlen = 2 * oldlen;
        if (maxlen < 128)
@@ -62,6 +62,35 @@
 
        i = oldlen;
        while (i--) {
+
+            /* needs some kind of if LC_CTYPE to check for encoding */ 
+            /* if charset=chinese?? */
+              /* check if this is the first character in a chinese pair */
+             ischinese = 0; 
+             if (i > 1) { 
+               oldnext = old+1; 
+               if ((*old >= 0xa1) &&
+                   (*old <= 0xf9) &&
+                   (((*oldnext >= 0x40) &&
+                     (*oldnext <= 0x73)) ||
+                    ((*oldnext >= 0xa1) &&
+                     (*oldnext <= 0xfe)))  
+                  ) ischinese = 1;
+             }
+             /* check if this is the seconde character in a chinese pair */
+             if ((i != oldlen) && (!ischinese)) {
+               oldprev = old-1;
+               if ((*oldprev >= 0xa1) &&  
+                   (*oldprev <= 0xf9) &&
+                   (((*old >= 0x40) &&
+                     (*old <= 0x73)) ||
+                    ((*old >= 0xa1) &&
+                     (*old <= 0xfe)))
+                  ) ischinese = 1;
+             }
+
+             if (!ischinese) { 
+               
                if (len + 9 > maxlen)
                        new = erealloc (new, maxlen += 128);
                if (38 == *old) {
@@ -87,9 +116,13 @@
                } else {
                        new [len++] = *old;
                }
-               old++;
+             } else {
+               /* it is chinese - ignore it */
+               new [len++] = *old;
+             }
+             old++;
        }
-    new [len] = '\0';
+        new [len] = '\0';
        *newlen = len;
 
        return new;



-- 
Technical Director
Linux Center (HK) Ltd.
www.hklc.com



-- 
PHP Development Mailing List <http://www.php.net/>
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]

Reply via email to