Interesting.  Your results confirm the anomaly that separating the 
parsing and validation steps results in a significant performance 
improvement.  Some questions:

*) Are the documents using the same bindings from the same schema?
*) Are they of comparable complexity
*) If you run the same test several times in a row, are the numbers 
stable?  (i.e., is there a cache-priming cost included?)
*) Can you repeat the tests using the same document on both machines?

The memory hog in PyXB is probably the finite state validator.  
Validation while the bindings are generated is a straightforward 
transition through the state graph; validation after generation uses a 
different algorithm that allows back-tracking if the transitions result 
in an unaccepting state, which is likely to take more memory.  The 
memory use depends on what sort of constructs you have in your 
schema/document: e.g., choice or all model groups (which support 
backtracking) take up more memory; wildcard matches are checked last so 
take longer to process.

I would be interested in doing some deeper analysis if you could send 
me, or allow me to download, your schema and one or two sample documents.

Peter

Romain CHANU wrote:
> Hi,
>
> I come back to you with a few values from the performance tests using 
> this piece of code:
>
> try:
>  
>     # Measure process time for parsing and validating
>     print ">> Parse and validate"
>     t0 = time.clock()
>     parse_and_validate = my_bindings.CreateFromDocument(file_test)
>     print time.clock() - t0, "seconds process time\n"
>      
>     # Measure process time for parsing without validation
>     pyxb.RequireValidWhenParsing(False)
>     print ">> Parse without validation"
>     t1 = time.clock()
>     dom_instance = my_bindings.CreateFromDocument(file_test)
>     print time.clock() - t1, "seconds process time\n"
>      
>     # Measure process time for validating the bindings
>     pyxb.RequireValidWhenParsing(True)
>     print ">> Validate the bindings"
>     t2 = time.clock()
>     dom_instance.validateBinding()
>     print time.clock() - t2, "seconds process time"
>
> except Exception, detail:
>  
>     print 'Error: ', detail
>     pass
>  
> sys.exit(0)
>
> 1) The first test is done using a Xeon 2.67 GHz with 4 GB Ram and 
> WinXP. The XML file size on the system is 7.92 MB. Here are the results:
>
> - Parse and validate: ~ 92s
> - Parse without validation: ~52s
> - Validate the bindings: ~12s
>
> 2) The second test is done using a Core 2 Duo 2.4 GHz with 4 GB Ram 
> and Win7. The XML file size on the system is 5.45 MB. Here are the 
> results:
>
> - Parse and validate: ~154s
> - Parse without validation: ~52s
> - Validate the bindings: ~24s
>
> Both systems are using Python 2.6.4.
>
> (FYI: with CodeSynthesis XSD, it takes about 3s to parse and validate 
> the file on the first machine using Linux).
>
> What is surprising me in those tests is that:
>
> -  "Parse and validate" on the second machine takes a lot more time 
> than the first machine (difference in the number of cores?)
> -  "Parse without validation" takes roughly the same time on both machines
>
> Also, I checked the memory usage in the task manager and the peak is 
> around 350 MB (can you explain why?)
>
> I hope it can help. Any comments?
>
> Cheers,
>
> Romain Chanu



------------------------------------------------------------------------------
Join us December 9, 2009 for the Red Hat Virtual Experience,
a free event focused on virtualization and cloud computing. 
Attend in-depth sessions from your desk. Your couch. Anywhere.
http://p.sf.net/sfu/redhat-sfdev2dev
_______________________________________________
pyxb-users mailing list
pyxb-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pyxb-users

Reply via email to