Hi,

I come back to you with a few values from the performance tests using this
piece of code:

try:

    # Measure process time for parsing and validating
    print ">> Parse and validate"
    t0 = time.clock()
    parse_and_validate = my_bindings.CreateFromDocument(file_test)
    print time.clock() - t0, "seconds process time\n"

    # Measure process time for parsing without validation
    pyxb.RequireValidWhenParsing(False)
    print ">> Parse without validation"
    t1 = time.clock()
    dom_instance = my_bindings.CreateFromDocument(file_test)
    print time.clock() - t1, "seconds process time\n"

    # Measure process time for validating the bindings
    pyxb.RequireValidWhenParsing(True)
    print ">> Validate the bindings"
    t2 = time.clock()
    dom_instance.validateBinding()
    print time.clock() - t2, "seconds process time"

except Exception, detail:

    print 'Error: ', detail
    pass

sys.exit(0)

1) The first test is done using a Xeon 2.67 GHz with 4 GB Ram and WinXP. The
XML file size on the system is 7.92 MB. Here are the results:

- Parse and validate: ~ 92s
- Parse without validation: ~52s
- Validate the bindings: ~12s

2) The second test is done using a Core 2 Duo 2.4 GHz with 4 GB Ram and
Win7. The XML file size on the system is 5.45 MB. Here are the results:

- Parse and validate: ~154s
- Parse without validation: ~52s
- Validate the bindings: ~24s

Both systems are using Python 2.6.4.

(FYI: with CodeSynthesis XSD, it takes about 3s to parse and validate the
file on the first machine using Linux).

What is surprising me in those tests is that:

-  "Parse and validate" on the second machine takes a lot more time than the
first machine (difference in the number of cores?)
-  "Parse without validation" takes roughly the same time on both machines

Also, I checked the memory usage in the task manager and the peak is around
350 MB (can you explain why?)

I hope it can help. Any comments?

Cheers,

Romain Chanu


2009/12/2 Romain CHANU <romainch...@gmail.com>

> I will perform a few tests tomorrow with the same machine that I was using
> (to be consistent).
>
> I came up with this performance issue because as a C++ user, I use
> CodeSynthesis XSD which parses and validates the same message in a few
> seconds.
>
> One idea was to eventually develop a C++ extension (with Boost Python) to
> do the parsing with XSD but it is not flexible at all. When it comes to do
> processing on the parsed data in the Python application, you need to wrap
> the different C++ objects, values to be returned in the Python application.
> It becomes quite dirty, and not maintainable.
>
> For what I have tested so far, and I agree with Tim, PyXB looks accurate
> since it can provide similar bindings in Python like XSD in C++. However,
> for advanced user, performance will be very important and critical in some
> applications.
>
>
> Romain Chanu
>
> 2009/12/2 Tim Cook <timothywayne.c...@gmail.com>
>
> Well, I do not think it is unfortunate at this point.
>>
>> As a newuser I would rather have accuracy as opposed to speed (though I
>> respect that others may be in a different situation).
>>
>> -Tim
>>
>>
>> On Wed, 2009-12-02 at 06:33 -0700, Peter A. Bigot wrote:
>> > Unfortunately, performance took a back seat to validation for the
>> > current implementation.  The example in examples/tmsxtvd has been the
>> > sole performance benchmark so far.  On my machine, it shows:
>> >
>> >   vmfed9[26]$ python dumpsample.py
>> >   Generating binding from tmsdatadirect_sample.xml with minidom
>> >   minidom first callSign at None
>> >   Generating binding from tmsdatadirect_sample.xml with SAXDOM
>> >   SAXDOM first callSign at tmsdatadirect_sample.xml[5:0]
>> >   Generating binding from tmsdatadirect_sample.xml with SAX
>> >   SAXER first callSign at tmsdatadirect_sample.xml[5:0]
>> >   DOM-based read 0.000962, parse 0.391175, bind 10.292386, total
>> 10.683561
>> >   SAXDOM-based parse 1.658077, bind 10.178704, total 11.836781
>> >   SAX-based read 0.000112, parse and bind 10.605082, total 10.605194
>> >
>> > These are using three different XML back ends to parse the document, but
>> > the same generated bindings and runtime support.  As you can see, the
>> > bulk of the time is in checking all the content and putting the values
>> > into Python objects.  The test document here is 205 KB in 10 seconds, so
>> > a 6MB document in 90 seconds is faster than I'd thought it might be.
>> >
>> > However, performance is unacceptable for certain applications.  There
>> > are a couple approaches.  One specifically that I have in mind is to
>> > implement an optimized back end stores values like integers and strings
>> > in native Python form rather than in the subclasses that support
>> > validation.  In that case, validation would become a second, optional,
>> > step that you'd have to invoke specifically on each object.  The
>> > following is the same program, same bindings, but with:
>> >
>> >   pyxb.RequireValidWhenParsing(False)
>> >
>> > set at the top of the script.  That option provides an extremely crude
>> > validation bypass, and I can't say it will work correctly in all
>> > situations.  However, the results are promising (and better than I'd
>> > expected):
>> >
>> >   Generating binding from tmsdatadirect_sample.xml with minidom
>> >   minidom first callSign at None
>> >   Generating binding from tmsdatadirect_sample.xml with SAXDOM
>> >   SAXDOM first callSign at tmsdatadirect_sample.xml[5:0]
>> >   Generating binding from tmsdatadirect_sample.xml with SAX
>> >   SAXER first callSign at tmsdatadirect_sample.xml[5:0]
>> >   DOM-based read 0.001482, parse 0.398322, bind 2.947036, total 3.345358
>> >   SAXDOM-based parse 1.677429, bind 2.689278, total 4.366707
>> >   SAX-based read 0.000217, parse and bind 3.052327, total 3.052544
>> >
>> > The separate validation step would be something like:
>> >
>> >   pyxb.RequireValidWhenParsing(True)
>> >   dom_instance.validateBinding()
>> >
>> > (You must reset the RequireValidWhenParsing flag, or the validateBinding
>> > method will immediately succeed.)  With this, I get the following
>> > additional time for validation:
>> >
>> >   DOM-based validate 1.676465
>> >   SAXDOM-based validate 1.699026
>> >   SAX-based validate 1.710580
>> >
>> > The fact that generation plus validation is half the time of generation
>> > with validation leaves me skeptical that this is working correctly.
>> >
>> > However, if that option meets your immediate performance needs, and you
>> > can live with either no validation or a second pass, possibly incorrect,
>> > validation, that's the best solution I have right now.  If you try it,
>> > please let us know how it affected the speed; and if it breaks please
>> > file a ticket on: http://sourceforge.net/apps/trac/pyxb/
>> >
>> > I have hopes that a proper optimized back end, with or without
>> > validation, will be available in about three months, but I need to see
>> > whether the folks I originally developed this for are interested in
>> > funding it.
>> >
>> > Peter
>> >
>> > Romain CHANU wrote:
>> > > Hi,
>> > >
>> > > Regarding my last email to the mailing list, I was trying to decide
>> > > whether to use PyXB or generateDS.
>> > >
>> > > As a matter of fact, generateDS does not perform any validation
>> > > against XML schema and had some issues in the creation of the bindings
>> > > for complex schemas.
>> > >
>> > > I am now facing a performance issue with PyXB: I parse and validate a
>> > > 6 Mo file containing XML data. This step takes about 90 seconds...
>> > >
>> > > Is this normal? Any hints to improve this?
>> > >
>> > > Thank you.
>> > >
>> > > Romain Chanu
>> > >
>> ------------------------------------------------------------------------
>> > >
>> > >
>> ------------------------------------------------------------------------------
>> > > Join us December 9, 2009 for the Red Hat Virtual Experience,
>> > > a free event focused on virtualization and cloud computing.
>> > > Attend in-depth sessions from your desk. Your couch. Anywhere.
>> > > http://p.sf.net/sfu/redhat-sfdev2dev
>> > >
>> ------------------------------------------------------------------------
>> > >
>> > > _______________________________________________
>> > > pyxb-users mailing list
>> > > pyxb-users@lists.sourceforge.net
>> > > https://lists.sourceforge.net/lists/listinfo/pyxb-users
>> > >
>> >
>> >
>> >
>> >
>> ------------------------------------------------------------------------------
>> > Join us December 9, 2009 for the Red Hat Virtual Experience,
>> > a free event focused on virtualization and cloud computing.
>> > Attend in-depth sessions from your desk. Your couch. Anywhere.
>> > http://p.sf.net/sfu/redhat-sfdev2dev
>> > _______________________________________________
>> > pyxb-users mailing list
>> > pyxb-users@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/pyxb-users
>>
>>
>> --
>> ***************************************************************
>> Timothy Cook, MSc
>>
>> LinkedIn Profile:http://www.linkedin.com/in/timothywaynecook
>> Skype ID == (upon request)
>> Academic.Edu Profile: http://uff.academia.edu/TimothyCook
>>
>> You may get my Public GPG key from  popular keyservers or
>> from this link http://timothywayne.cook.googlepages.com/home
>>
>>
>
------------------------------------------------------------------------------
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
_______________________________________________
pyxb-users mailing list
pyxb-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pyxb-users

Reply via email to