"Christopher Spears" <cspears2...@yahoo.com> wrote

I've been working on a way to parse an XML document and
convert it into a python dictionary. I want to maintain the hierarchy of the XML.

Here is the sample XML I have been working on:

<collection>
 <comic title="Sandman" number='62'>
   <writer>Neil Gaiman</writer>
   <penciller pages='1-9,18-24'>Glyn Dillon</penciller>
   <penciller pages="10-17">Charles Vess</penciller>
 </comic>
</collection>

This is my first stab at this:

#!/usr/bin/env python

from lxml import etree

def generateKey(element):
   if element.attrib:
       key = (element.tag, element.attrib)
   else:
key = element.tag
   return key

So how are you handling multiple identical tags? It looks from your code
that you will replace the content of the previous tag with the content of the
last found tag? I would expect your keys to have some reference to
either the parse depth or a sequuence count. In your sample XML the
problem never arises and maybe in your real data it will never happen
either, but in the general case it is quite common for the same tag
and attribute pair to be used multiple times in a document.


class parseXML(object):
   def __init__(self, xmlFile = 'test.xml'):
       self.xmlFile = xmlFile

   def parse(self):
       doc = etree.parse(self.xmlFile)
root = doc.getroot()
key = generateKey(root)
dictA = {}
for r in root.getchildren():
    keyR = generateKey(r)
    if r.text:
        dictA[keyR] = r.text
    if r.getchildren():
        dictA[keyR] = r.getchildren()

The script doesn't descend all of the way down because I'm
not sure how to hand a XML document that may have multiple layers.
Advice anyone?  Would this be a job for recursion?

Recursion is the classic way to deal with tree structures so
yes you could use there. provided your tree never exceeds
Pythons recursion depth limit (I think its still 1000 levels).

I'm not sure how converting etree's tree structure into a dictionary
will help you however. It seems like a lot of work for a small gain.

hth,

--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to