Interesting. Your results confirm the anomaly that separating the parsing and validation steps results in a significant performance improvement. Some questions:
*) Are the documents using the same bindings from the same schema? *) Are they of comparable complexity *) If you run the same test several times in a row, are the numbers stable? (i.e., is there a cache-priming cost included?) *) Can you repeat the tests using the same document on both machines? The memory hog in PyXB is probably the finite state validator. Validation while the bindings are generated is a straightforward transition through the state graph; validation after generation uses a different algorithm that allows back-tracking if the transitions result in an unaccepting state, which is likely to take more memory. The memory use depends on what sort of constructs you have in your schema/document: e.g., choice or all model groups (which support backtracking) take up more memory; wildcard matches are checked last so take longer to process. I would be interested in doing some deeper analysis if you could send me, or allow me to download, your schema and one or two sample documents. Peter Romain CHANU wrote: > Hi, > > I come back to you with a few values from the performance tests using > this piece of code: > > try: > > # Measure process time for parsing and validating > print ">> Parse and validate" > t0 = time.clock() > parse_and_validate = my_bindings.CreateFromDocument(file_test) > print time.clock() - t0, "seconds process time\n" > > # Measure process time for parsing without validation > pyxb.RequireValidWhenParsing(False) > print ">> Parse without validation" > t1 = time.clock() > dom_instance = my_bindings.CreateFromDocument(file_test) > print time.clock() - t1, "seconds process time\n" > > # Measure process time for validating the bindings > pyxb.RequireValidWhenParsing(True) > print ">> Validate the bindings" > t2 = time.clock() > dom_instance.validateBinding() > print time.clock() - t2, "seconds process time" > > except Exception, detail: > > print 'Error: ', detail > pass > > sys.exit(0) > > 1) The first test is done using a Xeon 2.67 GHz with 4 GB Ram and > WinXP. The XML file size on the system is 7.92 MB. Here are the results: > > - Parse and validate: ~ 92s > - Parse without validation: ~52s > - Validate the bindings: ~12s > > 2) The second test is done using a Core 2 Duo 2.4 GHz with 4 GB Ram > and Win7. The XML file size on the system is 5.45 MB. Here are the > results: > > - Parse and validate: ~154s > - Parse without validation: ~52s > - Validate the bindings: ~24s > > Both systems are using Python 2.6.4. > > (FYI: with CodeSynthesis XSD, it takes about 3s to parse and validate > the file on the first machine using Linux). > > What is surprising me in those tests is that: > > - "Parse and validate" on the second machine takes a lot more time > than the first machine (difference in the number of cores?) > - "Parse without validation" takes roughly the same time on both machines > > Also, I checked the memory usage in the task manager and the peak is > around 350 MB (can you explain why?) > > I hope it can help. Any comments? > > Cheers, > > Romain Chanu ------------------------------------------------------------------------------ Join us December 9, 2009 for the Red Hat Virtual Experience, a free event focused on virtualization and cloud computing. Attend in-depth sessions from your desk. Your couch. Anywhere. http://p.sf.net/sfu/redhat-sfdev2dev _______________________________________________ pyxb-users mailing list pyxb-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pyxb-users