On May 21, 6:57 pm, MRAB <goo...@mrabarnett.plus.com> wrote: > byron wrote: > > I am using the lxml.etree library to validate an xml instance file > > with a specified schema that contains the data types of each element. > > This is some of the internals of a function that extracts the > > elements: > > > schema_doc = etree.parse(schema_fn) > > schema = etree.XMLSchema(schema_doc) > > > context = etree.iterparse(xml_fn, events=('start', 'end'), > > schema=schema) > > > # get root > > event, root = context.next() > > > for event, elem in context: > > if event == 'end' and elem.tag == self.tag: > > yield elem > > root.clear() > > > I retrieve a list of elements from this... and do further processing > > to represent them in different ways. I need to be able to capture the > > data type from the schema definition for each field in the element. > > i.e. > > > <xsd:element name="concept"> > > <xsd:complexType> > > <xsd:sequence> > > <xsd:element ref="foo"/> > > <xsd:element name="concept_id" type="xsd:string"/> > > <xsd:element name="line" type="xsd:integer"/> > > <xsd:element name="concept_value" type="xsd:string"/> > > <xsd:element ref="some_date"/> > > </xsd:sequence> > > </xsd:complexType> > > </xsd:element> > > > My thought is to recursively traverse through the schema definition > > match the `name` attribute since they are unique to a `type` and > > return that element. But I can't seem to make it quite work. All the > > xml is valid, validation works, etc. This is what I have: > > > def find_node(tree, name): > > for c in tree: > > if c.attrib.get('name') == name: > > return c > > if len(c) > 0: > > return find_node(c, name) > > return 0 > > You're searching the first child and then returning the result, but what > you're looking for might not be in the first child; if it's not then you > need to search the next child: > > def find_node(tree, name): > for c in tree: > if c.attrib.get('name') == name: > return c > if len(c) > 0: > r = find_node(c, name) > if r: > return r > return None > > > I may have been staring at this too long, but when something is > > returned... it should be returned completely, no? This is what occurs > > with `return find_node(c, name) if it returns 0. `return c` works > > (used pdb to verify that), but the recursion continues and ends up > > returning 0. > > > Thoughts and/or a different approach are welcome. Thanks > >
Thanks. Yes i tried something like this, but I think I overwrite `c` when i wrote it, as in: if len(c) > 0: c = fin_node(c, name) if c is not None: return c Thanks for you help. -- http://mail.python.org/mailman/listinfo/python-list