I will perform a few tests tomorrow with the same machine that I was using
(to be consistent).
I came up with this performance issue because as a C++ user, I use
CodeSynthesis XSD which parses and validates the same message in a few
seconds.
One idea was to eventually develop a C++ extension (with Boost Python) to do
the parsing with XSD but it is not flexible at all. When it comes to do
processing on the parsed data in the Python application, you need to wrap
the different C++ objects, values to be returned in the Python application.
It becomes quite dirty, and not maintainable.
For what I have tested so far, and I agree with Tim, PyXB looks accurate
since it can provide similar bindings in Python like XSD in C++. However,
for advanced user, performance will be very important and critical in some
applications.
Romain Chanu
2009/12/2 Tim Cook <timothywayne.c...@gmail.com>
> Well, I do not think it is unfortunate at this point.
>
> As a newuser I would rather have accuracy as opposed to speed (though I
> respect that others may be in a different situation).
>
> -Tim
>
>
> On Wed, 2009-12-02 at 06:33 -0700, Peter A. Bigot wrote:
> > Unfortunately, performance took a back seat to validation for the
> > current implementation. The example in examples/tmsxtvd has been the
> > sole performance benchmark so far. On my machine, it shows:
> >
> > vmfed9[26]$ python dumpsample.py
> > Generating binding from tmsdatadirect_sample.xml with minidom
> > minidom first callSign at None
> > Generating binding from tmsdatadirect_sample.xml with SAXDOM
> > SAXDOM first callSign at tmsdatadirect_sample.xml[5:0]
> > Generating binding from tmsdatadirect_sample.xml with SAX
> > SAXER first callSign at tmsdatadirect_sample.xml[5:0]
> > DOM-based read 0.000962, parse 0.391175, bind 10.292386, total
> 10.683561
> > SAXDOM-based parse 1.658077, bind 10.178704, total 11.836781
> > SAX-based read 0.000112, parse and bind 10.605082, total 10.605194
> >
> > These are using three different XML back ends to parse the document, but
> > the same generated bindings and runtime support. As you can see, the
> > bulk of the time is in checking all the content and putting the values
> > into Python objects. The test document here is 205 KB in 10 seconds, so
> > a 6MB document in 90 seconds is faster than I'd thought it might be.
> >
> > However, performance is unacceptable for certain applications. There
> > are a couple approaches. One specifically that I have in mind is to
> > implement an optimized back end stores values like integers and strings
> > in native Python form rather than in the subclasses that support
> > validation. In that case, validation would become a second, optional,
> > step that you'd have to invoke specifically on each object. The
> > following is the same program, same bindings, but with:
> >
> > pyxb.RequireValidWhenParsing(False)
> >
> > set at the top of the script. That option provides an extremely crude
> > validation bypass, and I can't say it will work correctly in all
> > situations. However, the results are promising (and better than I'd
> > expected):
> >
> > Generating binding from tmsdatadirect_sample.xml with minidom
> > minidom first callSign at None
> > Generating binding from tmsdatadirect_sample.xml with SAXDOM
> > SAXDOM first callSign at tmsdatadirect_sample.xml[5:0]
> > Generating binding from tmsdatadirect_sample.xml with SAX
> > SAXER first callSign at tmsdatadirect_sample.xml[5:0]
> > DOM-based read 0.001482, parse 0.398322, bind 2.947036, total 3.345358
> > SAXDOM-based parse 1.677429, bind 2.689278, total 4.366707
> > SAX-based read 0.000217, parse and bind 3.052327, total 3.052544
> >
> > The separate validation step would be something like:
> >
> > pyxb.RequireValidWhenParsing(True)
> > dom_instance.validateBinding()
> >
> > (You must reset the RequireValidWhenParsing flag, or the validateBinding
> > method will immediately succeed.) With this, I get the following
> > additional time for validation:
> >
> > DOM-based validate 1.676465
> > SAXDOM-based validate 1.699026
> > SAX-based validate 1.710580
> >
> > The fact that generation plus validation is half the time of generation
> > with validation leaves me skeptical that this is working correctly.
> >
> > However, if that option meets your immediate performance needs, and you
> > can live with either no validation or a second pass, possibly incorrect,
> > validation, that's the best solution I have right now. If you try it,
> > please let us know how it affected the speed; and if it breaks please
> > file a ticket on: http://sourceforge.net/apps/trac/pyxb/
> >
> > I have hopes that a proper optimized back end, with or without
> > validation, will be available in about three months, but I need to see
> > whether the folks I originally developed this for are interested in
> > funding it.
> >
> > Peter
> >
> > Romain CHANU wrote:
> > > Hi,
> > >
> > > Regarding my last email to the mailing list, I was trying to decide
> > > whether to use PyXB or generateDS.
> > >
> > > As a matter of fact, generateDS does not perform any validation
> > > against XML schema and had some issues in the creation of the bindings
> > > for complex schemas.
> > >
> > > I am now facing a performance issue with PyXB: I parse and validate a
> > > 6 Mo file containing XML data. This step takes about 90 seconds...
> > >
> > > Is this normal? Any hints to improve this?
> > >
> > > Thank you.
> > >
> > > Romain Chanu
> > >
> ------------------------------------------------------------------------
> > >
> > >
> ------------------------------------------------------------------------------
> > > Join us December 9, 2009 for the Red Hat Virtual Experience,
> > > a free event focused on virtualization and cloud computing.
> > > Attend in-depth sessions from your desk. Your couch. Anywhere.
> > > http://p.sf.net/sfu/redhat-sfdev2dev
> > >
> ------------------------------------------------------------------------
> > >
> > > _______________________________________________
> > > pyxb-users mailing list
> > > pyxb-users@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/pyxb-users
> > >
> >
> >
> >
> >
> ------------------------------------------------------------------------------
> > Join us December 9, 2009 for the Red Hat Virtual Experience,
> > a free event focused on virtualization and cloud computing.
> > Attend in-depth sessions from your desk. Your couch. Anywhere.
> > http://p.sf.net/sfu/redhat-sfdev2dev
> > _______________________________________________
> > pyxb-users mailing list
> > pyxb-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/pyxb-users
>
>
> --
> ***************************************************************
> Timothy Cook, MSc
>
> LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook
> Skype ID == (upon request)
> Academic.Edu Profile: http://uff.academia.edu/TimothyCook
>
> You may get my Public GPG key from popular keyservers or
> from this link http://timothywayne.cook.googlepages.com/home
>
>
------------------------------------------------------------------------------
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing.
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
_______________________________________________
pyxb-users mailing list
pyxb-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pyxb-users