On Wed, Nov 21, 2007 at 09:02:47AM -0800, Srinivas Iyyer wrote:
> Dear tutors, 
> 
> I use ElementTree for XML works. I have a 1.3GB file
> to parse. 
> 
> 
> I takes a lot of time to open my input XML file. 
> 
> Is that because of my hardware limitation or am I
> using a blunt method to load the file.
> 
> my computer config:
> Inte(R)
> Pentium(R)4 CPU 2.80GHz
> 2.79GHz, 0.99GB of RAM
> 
> from elementtree import ElementTree
> myfile = open('myXML.out','r')
> 
> Do you suggest any tip to circumvent the file opening
> problem. 

If time is the problem, you might want to look at:

- cElementTree -- See notes about cElementTree on this page:
  http://effbot.org/zone/elementtree-13-intro.htm

- lxml -- http://codespeak.net/lxml/

If size/resources/memory are the issue, as must be the case for
you, then SAX can be a solution.  But, switching to SAX requires a
very radical redesign of your application.

You might also want to investigate pulldom.  It's in the Python
standard library.  A quote:

    "PullDOM has 80% of the speed of SAX and 80% of the convenience
    of the DOM. There are still circumstances where you might need
    SAX (speed freak!) or DOM (complete random access). But IMO
    there are a lot more circumstances where the PullDOM middle
    ground is exactly what you need."

The Python standard documentation on pulldom is next to none, but
here are several links:

  http://www.prescod.net/python/pulldom.html
  http://www.ibm.com/developerworks/xml/library/x-tipulldom.html
  http://www.idealliance.org/papers/dx_xml03/papers/06-02-03/06-02-03.html
  http://www.idealliance.org/papers/dx_xml03/papers/06-02-03/06-02-03.html#pull

Hope this helps.

Dave

-- 
Dave Kuhlman
http://www.rexx.com/~dkuhlman
_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Reply via email to