patch for HTML::Parser 3.34 to compile with Unicode on VMS

PPrymmer Fri, 28 Nov 2003 10:42:27 -0800


Greetings Gisle,

When building HTML-Parser-3.34 on VMS 7.3-1
against Perl 5.8.1 I answered 'y' to the following question:

Do you want decoding on unicode entities? [no]  y

I then started the make utility but the compilations
died on about three separate "unsigned char" versus
"char" type incompatabilities.  One of the first of such messages
looked like (please note that cutting and pasting
lines and sending them through this email agent
will result in odd line wrappering):

MCR dka300:[perl.perl-5_8_1_root]perl.exe mkpfunc >pfunc.h
CC/DECC /Include=[]/Standard=Relaxed_ANSI/Prefix=All/Obj=.obj
/NOANSI_ALIAS/float=ieee/ieee=denorm_results/Define=(MARKED_SECTION,UN
ICODE_ENTITIES,"VERSION=""3.34""","XS_VERSION=""3.34""")/Include=(perl_root:[lib.VMS_AXP.5_8_1.CORE])/NoList
  PARSER.c

                    char *tmp = uvuni_to_utf8(buf, num);
..............................................^
%CC-W-PTRMISMATCH1, In the initializer for tmp, the referenced type of the
pointer value "buf" is "char", which is not compatible wi
th "unsigned char" because they differ by signed/unsigned attribute.
at line number 136 in file USER:[PPRYMMER.HTML-PARSER-3_34]UTIL.C;1

I was however, able to get the HTML::Parser extension to
build by switching several types from char over to U8TYPE
and adding a couple of typecasts.  Here is a unified diff
of the changes I had to make offered as a suggestion.
I have not had time to test this with the non-Unicode entity
build nor have I tested this change on platforms other
than VMS:

--- util.c  Fri Nov 28 12:03:17 2003
+++ util.c;1      Fri Aug 15 14:38:37 2003
@@ -71,14 +71,10 @@
     char *end = s + len;
     char *ent_start;

-#ifdef UNICODE_ENTITIES
-    U8TYPE *repl;
-#else
     char *repl;
-#endif
     STRLEN repl_len;
 #ifdef UNICODE_ENTITIES
-    U8TYPE buf[UTF8_MAXLEN];
+    char buf[UTF8_MAXLEN];
     int repl_utf8;
 #else
     char buf[1];
@@ -137,7 +133,7 @@
                repl_utf8 = 0;
            }
            else {
-               U8TYPE *tmp = uvuni_to_utf8(buf, num);
+               char *tmp = uvuni_to_utf8(buf, num);
                repl = buf;
                repl_len = tmp - buf;
                repl_utf8 = 1;
@@ -158,22 +154,16 @@
          if (ent_name != s && entity2char) {
            SV** svp = hv_fetch(entity2char, ent_name, s - ent_name, 0);
            if (svp) {
+               repl = SvPV(*svp, repl_len);
 #ifdef UNICODE_ENTITIES
-               repl = (U8TYPE*) SvPV(*svp, repl_len);
                repl_utf8 = SvUTF8(*svp);
-#else
-               repl = SvPV(*svp, repl_len);
 #endif
            }
          }
      }

      if (repl) {
-#ifdef UNICODE_ENTITIES
-         U8TYPE *repl_allocated = 0;
-#else
          char *repl_allocated = 0;
-#endif
          if (*s == ';')
            s++;
          t--;  /* '&' already copied, undo it */
@@ -184,7 +174,7 @@
            if (len) {
                /* need to upgrade the part that we have looked though */
                STRLEN old_len = len;
-               U8TYPE *ustr = bytes_to_utf8((U8TYPE*)SvPVX(sv), &len);
+               char *ustr = bytes_to_utf8(SvPVX(sv), &len);
                STRLEN grow = len - old_len;
                if (grow) {
                  /* XXX It might already be enough gap, so we don't need this,
End of Patch.

Here I'll also include it as a MIME attachment so as to
avoid other trouble with the line wrappering mentioned
previously:

(See attached file: util.patch)

Regards,

Peter Prymmer
util.patch
Description: Binary data
patch for HTML::Parser 3.34 to compile with Unicode on VMS

Reply via email to