Currently the fix includes making some changes in DocumentScannerImpl.java & XMLEntityScanner based on the depth of entities and current entity being null.
I was looking for some thing more clean which could tell the scanner about the end of Document. I will look into this again and put back the change in next
few days.
I have a server app that parsers millions of smallish documents.
Performance has been improved at lot by reusing XMLReaders. It's pretty good but could perhaps get better in light of the (perhaps dubious?) hints given by the profiler snippet below.
Accordingly, the theory is that throwing an (artifical) EOFException in XMLEntityScanner.load() at the end of each document/entity consumes some 25% (JDK 1.5) and some 15% (JDK 1.4.2) of the total execution time, the single hottest spot in the program. Probably due too the heavy nature of exceptions and in particular Throwable.fillInStackTrace(). If this can indeed be confirmed by others, would it perhaps be possibly (and correct) to restructure the relevant xerces internals to avoid raising artificial exceptions for what appears to be normal program control flow (the documents and streams are fine)?
Configuration: Sun JDK 1.5 RC and Sun JDK 1.4.2, xerces CVS head [never using the JDK internal xerces which appears to be twice as slow in this case, for whatever reason]
JDK 1.5 RC contains almost latest Xerces. Could you tell what are you doing so that we can identify the problem and fix it ?
Thanks, Neeraj
Here is the JDK 1.5 profiler snippet (java -server -Xprof): ----------------------------------------------------------- Stub + native Method 28.6% 0 + 487 java.lang.Throwable.fillInStackTrace 28.6% 0 + 487 Total stub
Thread-local ticks: 0.1% 1 Blocked (of total) 0.1% 2 Class loader 0.1% 2 Compilation 0.2% 3 Unknown: thread_state
Flat profile of 0.01 secs (1 total ticks): DestroyJavaVM
Thread-local ticks: 100.0% 1 Blocked (of total)
Global summary of 35.44 seconds: 100.0% 1718 Received ticks 0.7% 12 Received GC ticks 9.7% 167 Compilation 0.1% 2 Class loader 0.2% 3 Unknown code
real 0m35.715s user 0m34.170s sys 0m0.190s
Here is the JDK 1.4 profiler snippet (java -server -Xprof): ----------------------------------------------------------- Stub + native Method 12.7% 4 + 239 java.lang.Throwable.fillInStackTrace 12.7% 4 + 239 Total stub
Runtime stub + native Method 0.2% 3 + 0 _rethrow_Java 0.2% 3 + 0 Total runtime stubs
Thread-local ticks: 3.1% 61 Blocked (of total) 0.4% 7 Interpreter 0.1% 2 Compilation 4.9% 93 Unknown: running frame
Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM
Thread-local ticks: 100.0% 1 Blocked (of total)
Global summary of 43.25 seconds: 100.0% 2071 Received ticks 3.8% 79 Received GC ticks 6.2% 128 Compilation 0.5% 10 Other VM operations 0.3% 7 Interpreter 4.5% 93 Unknown code
real 0m43.517s user 0m42.100s sys 0m0.530s
Trace via java -server -agentlib:hprof=cpu=samples,depth=30:
-----------------------------------------------------------
TRACE 300347:
java.lang.Throwable.fillInStackTrace(Throwable.java:Unknown line)
java.lang.Throwable.<init>(Throwable.java:181)
java.lang.Exception.<init>(Exception.java:29)
java.io.IOException.<init>(IOException.java:28)
java.io.EOFException.<init>(EOFException.java:32)
org.apache.xerces.impl.XMLEntityScanner.load(<Unknown Source>:Unknown line)
org.apache.xerces.impl.XMLEntityScanner.skipSpaces(<Unknown Source>:Unknown line)
org.apache.xerces.impl.XMLDocumentScannerImpl$TrailingMiscDispatcher.dis patch(<Unknown Source>:Unknown line)
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(<Unkn own Source>:Unknown line)
org.apache.xerces.parsers.DTDConfiguration.parse(<Unknown Source>:Unknown line)
org.apache.xerces.parsers.DTDConfiguration.parse(<Unknown Source>:Unknown line)
org.apache.xerces.parsers.XMLParser.parse(<Unknown Source>:Unknown line)
org.apache.xerces.parsers.AbstractSAXParser.parse(<Unknown Source>:Unknown line)
nu.xom.Builder.build(Builder.java:786)
nu.xom.Builder.build(Builder.java:569)
gov.lbl.dsd.firefish.trash.XMLXomBench.main(XMLXomBench.java:62)
I guess the relevant block is: -----------------------------------------------------------
XMLEntityScanner.load(...): ... if (changeEntity) { fEntityManager.endEntity(); if (fCurrentEntity == null) { throw new EOFException(); } // handle the trailing edges if (fCurrentEntity.position == fCurrentEntity.count) { load(0, true); } }
Comments?
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
