Unfortunately, performance took a back seat to validation for the 
current implementation.  The example in examples/tmsxtvd has been the 
sole performance benchmark so far.  On my machine, it shows:

  vmfed9[26]$ python dumpsample.py
  Generating binding from tmsdatadirect_sample.xml with minidom
  minidom first callSign at None
  Generating binding from tmsdatadirect_sample.xml with SAXDOM
  SAXDOM first callSign at tmsdatadirect_sample.xml[5:0]
  Generating binding from tmsdatadirect_sample.xml with SAX
  SAXER first callSign at tmsdatadirect_sample.xml[5:0]
  DOM-based read 0.000962, parse 0.391175, bind 10.292386, total 10.683561
  SAXDOM-based parse 1.658077, bind 10.178704, total 11.836781
  SAX-based read 0.000112, parse and bind 10.605082, total 10.605194

These are using three different XML back ends to parse the document, but 
the same generated bindings and runtime support.  As you can see, the 
bulk of the time is in checking all the content and putting the values 
into Python objects.  The test document here is 205 KB in 10 seconds, so 
a 6MB document in 90 seconds is faster than I'd thought it might be.

However, performance is unacceptable for certain applications.  There 
are a couple approaches.  One specifically that I have in mind is to 
implement an optimized back end stores values like integers and strings 
in native Python form rather than in the subclasses that support 
validation.  In that case, validation would become a second, optional, 
step that you'd have to invoke specifically on each object.  The 
following is the same program, same bindings, but with:

  pyxb.RequireValidWhenParsing(False)

set at the top of the script.  That option provides an extremely crude 
validation bypass, and I can't say it will work correctly in all 
situations.  However, the results are promising (and better than I'd 
expected):

  Generating binding from tmsdatadirect_sample.xml with minidom
  minidom first callSign at None
  Generating binding from tmsdatadirect_sample.xml with SAXDOM
  SAXDOM first callSign at tmsdatadirect_sample.xml[5:0]
  Generating binding from tmsdatadirect_sample.xml with SAX
  SAXER first callSign at tmsdatadirect_sample.xml[5:0]
  DOM-based read 0.001482, parse 0.398322, bind 2.947036, total 3.345358
  SAXDOM-based parse 1.677429, bind 2.689278, total 4.366707
  SAX-based read 0.000217, parse and bind 3.052327, total 3.052544

The separate validation step would be something like:

  pyxb.RequireValidWhenParsing(True)
  dom_instance.validateBinding()

(You must reset the RequireValidWhenParsing flag, or the validateBinding 
method will immediately succeed.)  With this, I get the following 
additional time for validation:

  DOM-based validate 1.676465
  SAXDOM-based validate 1.699026
  SAX-based validate 1.710580

The fact that generation plus validation is half the time of generation 
with validation leaves me skeptical that this is working correctly.

However, if that option meets your immediate performance needs, and you 
can live with either no validation or a second pass, possibly incorrect, 
validation, that's the best solution I have right now.  If you try it, 
please let us know how it affected the speed; and if it breaks please 
file a ticket on: http://sourceforge.net/apps/trac/pyxb/

I have hopes that a proper optimized back end, with or without 
validation, will be available in about three months, but I need to see 
whether the folks I originally developed this for are interested in 
funding it.

Peter

Romain CHANU wrote:
> Hi,
>
> Regarding my last email to the mailing list, I was trying to decide 
> whether to use PyXB or generateDS.
>
> As a matter of fact, generateDS does not perform any validation 
> against XML schema and had some issues in the creation of the bindings 
> for complex schemas.
>
> I am now facing a performance issue with PyXB: I parse and validate a 
> 6 Mo file containing XML data. This step takes about 90 seconds...
>
> Is this normal? Any hints to improve this?
>
> Thank you.
>
> Romain Chanu
> ------------------------------------------------------------------------
>
> ------------------------------------------------------------------------------
> Join us December 9, 2009 for the Red Hat Virtual Experience,
> a free event focused on virtualization and cloud computing. 
> Attend in-depth sessions from your desk. Your couch. Anywhere.
> http://p.sf.net/sfu/redhat-sfdev2dev
> ------------------------------------------------------------------------
>
> _______________________________________________
> pyxb-users mailing list
> pyxb-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pyxb-users
>   



------------------------------------------------------------------------------
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
_______________________________________________
pyxb-users mailing list
pyxb-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pyxb-users

Reply via email to