Hi Joe,

     I promised some time ago to look into the extra cost of having an 
unused XML Schema validator in the pipeline.  I found a number of areas 
for improvement, supplied patches for some to Sandy Gao who applied them a 
couple of weeks ago, and I also suggested to him some areas in which 
additional improvements could be made.

     Some improvements were specific to having a schema validator in the 
pipeline, some improved both the schema validator and other components. 
One improvement only holds so long as no schema validation is actually 
required; after schema validation actually occurs once, the cost of 
resetting the parser increases.

     For the particulars, see the Xerces-J CVS log from June 8 for 
XSDHandler.java and from June 4 for ParserConfigurationSettings.java, 
XMLSchemaValidator.java and AugmentationsImpl.java.

     These changes were included in the Xerces-J 2.0.2 release, so if you 
have the time or inclination you might check whether these changes have 
improved the situation.  This effort is by no means complete; I believe 
there will still be a sizable penalty for including the schema validator 
in the pipeline, but I'm hopeful that these changes have narrowed the gap 
at least a little.

Thanks,

Henry
------------------------------------------------------------------
Henry Zongaro      Xalan development
IBM SWS Toronto Lab   Tie Line 969-6044;  Phone (905) 413-6044
mailto:[EMAIL PROTECTED]

----- Forwarded by Henry Zongaro/Toronto/IBM on 02/06/26 09:30 AM -----


Henry Zongaro
02/05/03 10:38 AM


        To:     [EMAIL PROTECTED]
        cc: 
        From:   Henry Zongaro/Toronto/IBM@IBMCA
        Subject:        Re: Schema validation -- known performance problem?
 





Hi Joseph,

     The first time the parse() method of a parser instance that uses the 
StandardParserConfiguration is invoked with the schema validation feature 
set to true, there is some overhead involved in creating a schema 
validator for that parser.  Among other things, it entails creating a new 
DOMParser.  Usually the cost of creating parser components is paid when 
the parser is constructed, but in this case the cost is deferred until the 
first parse().

     On top of that, there's additional cost for the first schema 
validator that is constructed - it's responsible for constructing the 
various datatype validators that are built-in for schema.

     So you're right that there's some first-time object creation that is 
playing a part.  It's possible that construction of some of these things 
might be deferred until they are known to be needed without a severe 
impact on the case in which they are needed.

     On top of that, I think there's some additional overhead during 
parse-time proper that we might be able to improve upon fairly readily.

     I'll try to spend some time over the next few days to look into 
these.

Thanks,

Henry
------------------------------------------------------------------
Henry Zongaro      XML Parsers development
IBM SWS Toronto Lab   Tie Line 969-6044;  Phone (905) 413-6044
mailto:[EMAIL PROTECTED]





Joseph Kesselman/CAM/Lotus@Lotus
02/05/02 02:01 PM
Please respond to xerces-j-dev

 
        To:     [EMAIL PROTECTED]
        cc: 
        Subject:        Re: Schema validation -- known performance problem?

 


On Wednesday, 05/01/2002 at 11:24 AST, Elena Litani <[EMAIL PROTECTED]>
wrote:
> Joe,
>
> Joseph Kesselman/CAM/Lotus wrote:
> > HOWEVER -- when I turn on the schema validator, parser performance
falls
> > through the floor -- even though none of the test documents references
a
> > schema, and only two of them reference a DTD. The parse() operation
takes
> > almost twice as long to complete.
>
> This is a single parse(), correct? I mean you did not use any warm-up..?

The testcase I'm running  parses about 40 documents. It does instantiate a
new copy of the parser for each one, if that's what you're asking. So no,
this isn't a first-time code-load problem, though it may be a first-time
object-initialization problem.

And as I said, time difference is emphatically _NOT_ insignificant in 
these
tests. As I said: 2:1 difference measured in this test.

> Currently we try to validate against both: DTDs and XML Schemas. That is
> why we do check if XML Schema is found on some element.

I understand the goal. But poor performance in schema mode is going to 
push
folks away from Xerces...



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to