Unfortunately, performance took a back seat to validation for the current implementation. The example in examples/tmsxtvd has been the sole performance benchmark so far. On my machine, it shows:
vmfed9[26]$ python dumpsample.py Generating binding from tmsdatadirect_sample.xml with minidom minidom first callSign at None Generating binding from tmsdatadirect_sample.xml with SAXDOM SAXDOM first callSign at tmsdatadirect_sample.xml[5:0] Generating binding from tmsdatadirect_sample.xml with SAX SAXER first callSign at tmsdatadirect_sample.xml[5:0] DOM-based read 0.000962, parse 0.391175, bind 10.292386, total 10.683561 SAXDOM-based parse 1.658077, bind 10.178704, total 11.836781 SAX-based read 0.000112, parse and bind 10.605082, total 10.605194 These are using three different XML back ends to parse the document, but the same generated bindings and runtime support. As you can see, the bulk of the time is in checking all the content and putting the values into Python objects. The test document here is 205 KB in 10 seconds, so a 6MB document in 90 seconds is faster than I'd thought it might be. However, performance is unacceptable for certain applications. There are a couple approaches. One specifically that I have in mind is to implement an optimized back end stores values like integers and strings in native Python form rather than in the subclasses that support validation. In that case, validation would become a second, optional, step that you'd have to invoke specifically on each object. The following is the same program, same bindings, but with: pyxb.RequireValidWhenParsing(False) set at the top of the script. That option provides an extremely crude validation bypass, and I can't say it will work correctly in all situations. However, the results are promising (and better than I'd expected): Generating binding from tmsdatadirect_sample.xml with minidom minidom first callSign at None Generating binding from tmsdatadirect_sample.xml with SAXDOM SAXDOM first callSign at tmsdatadirect_sample.xml[5:0] Generating binding from tmsdatadirect_sample.xml with SAX SAXER first callSign at tmsdatadirect_sample.xml[5:0] DOM-based read 0.001482, parse 0.398322, bind 2.947036, total 3.345358 SAXDOM-based parse 1.677429, bind 2.689278, total 4.366707 SAX-based read 0.000217, parse and bind 3.052327, total 3.052544 The separate validation step would be something like: pyxb.RequireValidWhenParsing(True) dom_instance.validateBinding() (You must reset the RequireValidWhenParsing flag, or the validateBinding method will immediately succeed.) With this, I get the following additional time for validation: DOM-based validate 1.676465 SAXDOM-based validate 1.699026 SAX-based validate 1.710580 The fact that generation plus validation is half the time of generation with validation leaves me skeptical that this is working correctly. However, if that option meets your immediate performance needs, and you can live with either no validation or a second pass, possibly incorrect, validation, that's the best solution I have right now. If you try it, please let us know how it affected the speed; and if it breaks please file a ticket on: http://sourceforge.net/apps/trac/pyxb/ I have hopes that a proper optimized back end, with or without validation, will be available in about three months, but I need to see whether the folks I originally developed this for are interested in funding it. Peter Romain CHANU wrote: > Hi, > > Regarding my last email to the mailing list, I was trying to decide > whether to use PyXB or generateDS. > > As a matter of fact, generateDS does not perform any validation > against XML schema and had some issues in the creation of the bindings > for complex schemas. > > I am now facing a performance issue with PyXB: I parse and validate a > 6 Mo file containing XML data. This step takes about 90 seconds... > > Is this normal? Any hints to improve this? > > Thank you. > > Romain Chanu > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------------ > Join us December 9, 2009 for the Red Hat Virtual Experience, > a free event focused on virtualization and cloud computing. > Attend in-depth sessions from your desk. Your couch. Anywhere. > http://p.sf.net/sfu/redhat-sfdev2dev > ------------------------------------------------------------------------ > > _______________________________________________ > pyxb-users mailing list > pyxb-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pyxb-users > ------------------------------------------------------------------------------ Join us December 9, 2009 for the Red Hat Virtual Experience, a free event focused on virtualization and cloud computing. Attend in-depth sessions from your desk. Your couch. Anywhere. http://p.sf.net/sfu/redhat-sfdev2dev _______________________________________________ pyxb-users mailing list pyxb-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pyxb-users