This bug applies to versions 1.2.* and 1.3.* of Xerces. It does not exist in 1.0.* .
It seems the XMLDTDScanner is relying upon the SystemId property of org.xml.sax.InputSource when an XML document contains external unparsed entities (example XML is below). If the SystemId is not set, you get a NullPointerException at org.apache.xerces.utils.StringPool.addSymbol(StringPool.java:348). Constructing an InputSource from a Reader or InputStream will create this behavior, i.e. getSystemId() returns null. The method setSystemId(String) can be called to manually set the systemid as a workaround. This requirement should be either fixed, or properly handled with a message (and documented). Example XML: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE test [ <!ENTITY hello SYSTEM "hello.gif" NDATA gif> ]> <tag>hello world</tag> Example program: import org.xml.sax.*; import org.xml.sax.helpers.*; import java.io.*; public class TestXerces { public static void main(String [] argv) { String uri = argv[0]; try { XMLReader reader = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser"); InputSource i = new InputSource(new FileReader(uri)); // uncomment this line to make it work //i.setSystemId(uri); reader.parse(i); } catch (SAXException se) { if (se.getException() != null) se.getException().printStackTrace(); se.printStackTrace(); } catch (Exception e) { e.printStackTrace(); } } } usage: java TestXerces <filename>