NekoParser returns cryptic error messages when parsing bad html
---------------------------------------------------------------

                 Key: SHINDIG-987
                 URL: https://issues.apache.org/jira/browse/SHINDIG-987
             Project: Shindig
          Issue Type: Bug
    Affects Versions: trunk
            Reporter: Paul Lindner


startImportantElement can throw exceptions when parsing malformed html:

Given this html:

    <div id="div_super" class="div_super" valign:"middle"></div>

You get an exception like this:

org.w3c.dom.DOMException: INVALID_CHARACTER_ERR: An invalid or illegal XML 
character is specified. 
        org.apache.xerces.dom.CoreDocumentImpl.createAttribute(Unknown Source)
        org.apache.xerces.dom.ElementImpl.setAttribute(Unknown Source)
        
org.apache.shindig.gadgets.parse.nekohtml.NekoSimplifiedHtmlParser$DocumentHandler.startImportantElement(NekoSimplifiedHtmlParser.java:292)
        
org.apache.shindig.gadgets.parse.nekohtml.NekoSimplifiedHtmlParser$DocumentHandler.startElement(NekoSimplifiedHtmlParser.java:242)
        
org.apache.shindig.gadgets.parse.nekohtml.SocialMarkupHtmlParser$SocialMarkupDocumentHandler.startElement(SocialMarkupHtmlParser.java:130)

Which is caused here:

      for (int i = 0; i < xmlAttributes.getLength(); i++) {
        if (xmlAttributes.getURI(i) != null) {
          element.setAttributeNS(xmlAttributes.getURI(i), 
xmlAttributes.getQName(i),
              xmlAttributes.getValue(i));
        } else {
          element.setAttribute(xmlAttributes.getLocalName(i) , 
xmlAttributes.getValue(i));
        }
      }

because we're trying to set a tag with a colon in it.

We should probably add some error checking here so that we can more easily 
identify the offending HTML without using a debugger.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to