Hi Daniel,

> I don't really know SGML, so such patches are welcome. I just have one
> problem with the code, it calls GROW only when the end of the buffer is
> detected with a NUL, I would rather have it called more preemtively to
> in the loop to avoid a potential weakness in the case of multibyte chars.

I have changed the patch to call GROW in the loop each time before moving
on to the next character. (I don't know whether I should be calling SHRINK
as well, though?)

>   Note also that I prefer patches than cut an paste of full routines, it
> gives me the context of what was changed.

Here is a unified diff, is that the right format?

Best regards,

Michael

-- 
Print XML with Prince!
http://www.princexml.com
@@ -2969,18 +2970,24 @@
  * htmlParseComment:
  * @ctxt:  an HTML parser context
  *
- * Parse an XML (SGML) comment <!-- .... -->
+ * Parse an HTML (SGML) comment <!-- .... -->
  *
- * [15] Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'
+ * Note that unlike XML comments, HTML/SGML comments can contain "--".
+ * This sequence toggles whether ">" will end the comment or not.
+ * The following three comments are all valid HTML/SGML comments:
+ *
+ *     <!-- Hello -- blah -- world -->
+ *     <!-- Hello -- blah -- world -- foo >
+ *     <!-- foo>bar -- foo-->bar world -->
  */
 static void
 htmlParseComment(htmlParserCtxtPtr ctxt) {
     xmlChar *buf = NULL;
     int len;
     int size = HTML_PARSER_BUFFER_SIZE;
-    int q, ql;
-    int r, rl;
     int cur, l;
+    int allow_gt;
+    int dashes;
     xmlParserInputState state;
 
     /*
@@ -2999,15 +3006,13 @@
        ctxt->instate = state;
        return;
     }
-    q = CUR_CHAR(ql);
-    NEXTL(ql);
-    r = CUR_CHAR(rl);
-    NEXTL(rl);
     cur = CUR_CHAR(l);
     len = 0;
+    allow_gt = 1;
+    dashes = 0;
     while (IS_CHAR(cur) &&
-           ((cur != '>') ||
-           (r != '-') || (q != '-'))) {
+          ((cur != '>') || allow_gt)) {
+
        if (len + 5 >= size) {
            xmlChar *tmp;
 
@@ -3021,18 +3026,27 @@
            }
            buf = tmp;
        }
-       COPY_BUF(ql,buf,len,q);
-       q = r;
-       ql = rl;
-       r = cur;
-       rl = l;
+       COPY_BUF(l,buf,len,cur);
+       
+       if (cur == '-')
+       {
+           ++dashes;
+
+           if (dashes == 2)
+           {
+               allow_gt = !allow_gt;
+               dashes = 0;
+               len -= 2;
+           }
+       }
+       else
+       {
+           dashes = 0;
+       }
+       
+       GROW;
        NEXTL(l);
        cur = CUR_CHAR(l);
-       if (cur == 0) {
-           SHRINK;
-           GROW;
-           cur = CUR_CHAR(l);
-       }
     }
     buf[len] = 0;
     if (!IS_CHAR(cur)) {
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to