[XML-SIG] XML for scientific data storage and search

Ping Yeh Wed, 12 Jan 2005 11:30:29 -0800

Hello,

I'm a newbie to XML, just wrote a program that can store my scientific data objects as an XML file and restore them later (like marshaling). However, I found it is extremely slow... I changed the implementation from minidom to sax. It speeds up somewhat (30% or so) for small files but not enough. If I go back to using binary data the speed is ~ 5 times faster or more. Are there widely used ways to speed up parsing?

Another problem is memory footprint. My XML data file can be large: 10s of megabytes with 100 thousands of objects. If I use xml.sax.parseString() it parses the whole string into memory objects which inflats. I only need to loop over the objects in the XML file once. Are there common ways to do a delayed read? I'm looking for something like

xml.sax.parseFile('data0.xml', myContentHandler)
objects = myContentHandler.getObjects()   # returns an iterator
for obj in objects:    # reading occurs here (delayed reading)
   # do something with obj...

But I haven't found any.  I'm not sure this is possible with current
architecture of parsers.  Any advise is highly appreciated.

Thanks,
Ping

_______________________________________________
XML-SIG maillist  -  XML-SIG@python.org
http://mail.python.org/mailman/listinfo/xml-sig

[XML-SIG] XML for scientific data storage and search

Reply via email to