DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5382>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5382 Limitation of the number of namespace declarations Summary: Limitation of the number of namespace declarations Product: Xerces-J Version: 1.4.4 Platform: All OS/Version: All Status: NEW Severity: Normal Priority: Other Component: DOM AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Xerces-J 1.4.4 cannot parse XML documents with a very large number of namespace declarations. (e.g. attribute such as: xmlns:prefix="uri"). This bug has been encounter in a production system that uses XSLT (Xalan) to process very large XML documents. I propose a simple fix for this problem (see below). HOW TO REPRODUCE THIS BUG: ~~~~~~~~~~~~~~~~~~~~~~~~~~ To reproduce this bug try to parse an XML document structured as follows: <?xml version="1.0" encoding="UTF-8"?> <root> <a:para xmlns:a="urn:a"/> <a:para xmlns:a="urn:a"/> <a:para xmlns:a="urn:a"/> ... <a:para xmlns:a="urn:a"/> <a:para xmlns:a="urn:a"/> <a:para xmlns:a="urn:a"/> <b:test xmlns:b="urn:b"/> </root> There should be 16360 <a:para xmlns:a="urn:a"/> child elements of the <root> element to reproduce the bug. The text nodes containing only spaces used for the indentation are important. If you try to parse this kind of XML document with Xerces-J 1.4.4 the following NullPointerException is thrown: java.lang.NullPointerException at org.apache.xerces.dom.DeferredElementNSImpl.synchronizeData (DeferredElementNSImpl.java:154) at org.apache.xerces.dom.ElementImpl.getNodeName(ElementImpl.java:144) at NSLimitationBug.main(NSLimitationBug.java:26) Here is the source code of my NSLimitationBug class that produces the NullPointerException. You should change the path of the XML file to load. import java.io.*; import org.apache.xerces.parsers.*; import org.w3c.dom.*; import org.xml.sax.InputSource; public class NSLimitationBug { public static void main(String[] args) { try { File file = new File("E:\\XercesBug\\large-in.xml"); Reader reader = new BufferedReader(new FileReader(file)); DOMParser parser = new DOMParser(); parser.setFeature("http://xml.org/sax/features/validation", false); parser.setFeature("http://apache.org/xml/features/dom/defer-node- expansion", true); InputSource source = new InputSource(reader); parser.parse(source); Document doc = parser.getDocument(); NodeList children = doc.getDocumentElement().getChildNodes(); int count = children.getLength(); Element lastElem = (Element) children.item(count - 2); // The last child is a text node System.out.println("Name: '" + lastElem.getNodeName() + "'"); System.out.println("Namespace URI: '" + lastElem.getNamespaceURI() + "'"); } catch (Throwable t) { t.printStackTrace(); } } } PROPOSED FIX: ~~~~~~~~~~~~ Apparently this bug is due to a coding error in the org.apache.xerces.deom.DefferedDocumentImpl class. In fact, in the method org.apache.xerces.dom.DefferedDocumentImpl#getNodeURI (int nodeIndex, boolean free) an integer is down casted into a short for no reason. For the last element child of the <root> element the integer to cast is 32768. As the maximum short number is 32767, the integer 32768 is casted into the short -32768. In fact: (short)32768 == -32768 This later results into the NullPointerException. To fix this bug, simply remove the down casting into a short and change the return type of the two 'getNodeURI' to integer. Here is the code of these two methods after applying this fix: /** Returns the URI of the given node. */ public int getNodeURI(int nodeIndex) { return getNodeURI(nodeIndex, true); } /** * Returns the URI of the given node. * @param True to free URI index. */ public int getNodeURI(int nodeIndex, boolean free) { if (nodeIndex == -1) { return -1; } int chunk = nodeIndex >> CHUNK_SHIFT; int index = nodeIndex & CHUNK_MASK; if (free) { return clearChunkIndex(fNodeURI, chunk, index); } return getChunkIndex(fNodeURI, chunk, index); } // getNodeURI(int):int NOTE: ~~~~~ I have first noted this bug in Xerces-J 1.2.0. With this version there is no NullPointerException. However the namespace URI of the last element is 'null' instead of being 'urn:b'. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
