Hi Joe,
It's a known issue; this is one reason the schema validator is only in
the pipeline in the StandardParserConfiguration class when schema
validation is enabled. Thus far, no one has had the opportunity to look at
improving the situation, to the best of my knowledge. (Volunteers
welcome!)
In your particular experiment, what parser features were enabled?
After a quick snoop through the code, I see that the schema validator
needs to check on every element in the document whether an
xsi:schemaLocation or xsi:noNamespaceSchemaLocation attribute, whether
there's any schema associated with the namespace of the element, and so on.
There's probably some fat that can be trimmed there, but I suspect the
recommendation will always be to turn off schema validation if you know
your document doesn't need it.
Thanks,
Henry
------------------------------------------------------------------
Henry Zongaro XML Parsers development
IBM SWS Toronto Lab Tie Line 969-6044; Phone (905) 413-6044
mailto:[EMAIL PROTECTED]
Joseph
Kesselman/CAM/L To: [EMAIL PROTECTED]
otus@Lotus cc:
Subject: Schema validation -- known
performance problem?
02/04/30 06:25
PM
Please respond
to xerces-j-dev
I've got an ... interesting ... result on my hands.
I've been experimenting with feeding an application (Xalan) directly from
an XNI stream rather than from a SAX stream. The experimental code's
basically just doing a conversion from XNI to SAX and calling my normal SAX
handlers... plus a bit of additional hacking about if we happen to find a
PSVI annotation.
With schema validation turned off, performance of the XNI setup is
comparable to that of the SAX version -- as expected, since my XNI-to-SAX
conversion is probably very similar to what you folks do.
HOWEVER -- when I turn on the schema validator, parser performance falls
through the floor -- even though none of the test documents references a
schema, and only two of them reference a DTD. The parse() operation takes
almost twice as long to complete.
JProbe calls out the following as accounting for most of the difference.
The measurement is how much fasterr NON-schema-validated is versus
shema-validated, and includes all methods called by the named method. I
haven't attempted to sort them into who-calls-who order, though I suspec
the time sort actually comes pretty close to achieving that. (Apologies in
advance if this doesn't line up nicely on your screen; try a fixed-pitch
font.)
Cumulative time
StandardParserConfiguration.parse(boolean) -169872 (-48.7%)
XMLDocumentFragmentScannerImpl.scanDocument(boolean) -167192 (-48.4%)
XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(boolean)
-167182 (-48.4%)
XMLNamespaceBinder.handleStartElement(QName, XMLAttributes, Augmentations,
boolean)
-112707 (-67.2%)
XMLDocumentFragmentScannerImpl.scanStartElement() -112687 (-59.3%)
XMLDTDValidator.startElement(QName, XMLAttributes, Augmentations)
-112335 (-65.1%)
XMLNamespaceBinder.startElement(QName, XMLAttributes, Augmentations)
-112305 (-66.8%)
Is this a known issue (possibly already patched)? Or have I botched the
parser configuration somehow?
If the answer is "yes, it's slow and we're working on it" or "we've
improved in in the current CVS code", that's fine... but I thought I should
make sure you were aware of this, and ensure that it wasn't something
particularly stupid in my own code, before I proceeded to work on trying to
optimize my end of things.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]