Hi Joe,
I promised some time ago to look into the extra cost of having an
unused XML Schema validator in the pipeline. I found a number of areas
for improvement, supplied patches for some to Sandy Gao who applied them a
couple of weeks ago, and I also suggested to him some areas in which
additional improvements could be made.
Some improvements were specific to having a schema validator in the
pipeline, some improved both the schema validator and other components.
One improvement only holds so long as no schema validation is actually
required; after schema validation actually occurs once, the cost of
resetting the parser increases.
For the particulars, see the Xerces-J CVS log from June 8 for
XSDHandler.java and from June 4 for ParserConfigurationSettings.java,
XMLSchemaValidator.java and AugmentationsImpl.java.
These changes were included in the Xerces-J 2.0.2 release, so if you
have the time or inclination you might check whether these changes have
improved the situation. This effort is by no means complete; I believe
there will still be a sizable penalty for including the schema validator
in the pipeline, but I'm hopeful that these changes have narrowed the gap
at least a little.
Thanks,
Henry
------------------------------------------------------------------
Henry Zongaro Xalan development
IBM SWS Toronto Lab Tie Line 969-6044; Phone (905) 413-6044
mailto:[EMAIL PROTECTED]
----- Forwarded by Henry Zongaro/Toronto/IBM on 02/06/26 09:30 AM -----
Henry Zongaro
02/05/03 10:38 AM
To: [EMAIL PROTECTED]
cc:
From: Henry Zongaro/Toronto/IBM@IBMCA
Subject: Re: Schema validation -- known performance problem?
Hi Joseph,
The first time the parse() method of a parser instance that uses the
StandardParserConfiguration is invoked with the schema validation feature
set to true, there is some overhead involved in creating a schema
validator for that parser. Among other things, it entails creating a new
DOMParser. Usually the cost of creating parser components is paid when
the parser is constructed, but in this case the cost is deferred until the
first parse().
On top of that, there's additional cost for the first schema
validator that is constructed - it's responsible for constructing the
various datatype validators that are built-in for schema.
So you're right that there's some first-time object creation that is
playing a part. It's possible that construction of some of these things
might be deferred until they are known to be needed without a severe
impact on the case in which they are needed.
On top of that, I think there's some additional overhead during
parse-time proper that we might be able to improve upon fairly readily.
I'll try to spend some time over the next few days to look into
these.
Thanks,
Henry
------------------------------------------------------------------
Henry Zongaro XML Parsers development
IBM SWS Toronto Lab Tie Line 969-6044; Phone (905) 413-6044
mailto:[EMAIL PROTECTED]
Joseph Kesselman/CAM/Lotus@Lotus
02/05/02 02:01 PM
Please respond to xerces-j-dev
To: [EMAIL PROTECTED]
cc:
Subject: Re: Schema validation -- known performance problem?
On Wednesday, 05/01/2002 at 11:24 AST, Elena Litani <[EMAIL PROTECTED]>
wrote:
> Joe,
>
> Joseph Kesselman/CAM/Lotus wrote:
> > HOWEVER -- when I turn on the schema validator, parser performance
falls
> > through the floor -- even though none of the test documents references
a
> > schema, and only two of them reference a DTD. The parse() operation
takes
> > almost twice as long to complete.
>
> This is a single parse(), correct? I mean you did not use any warm-up..?
The testcase I'm running parses about 40 documents. It does instantiate a
new copy of the parser for each one, if that's what you're asking. So no,
this isn't a first-time code-load problem, though it may be a first-time
object-initialization problem.
And as I said, time difference is emphatically _NOT_ insignificant in
these
tests. As I said: 2:1 difference measured in this test.
> Currently we try to validate against both: DTDs and XML Schemas. That is
> why we do check if XML Schema is found on some element.
I understand the goal. But poor performance in schema mode is going to
push
folks away from Xerces...
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]