[EMAIL PROTECTED] wrote:

I would like to retrieve what is between the tags <node> ...</node> into
strings, the "subelements" being considered as simple string and not processed
by elelement tree.
In other words, this could be badly formed HTML  not processed embeded into well
formed xml tags.

i.e. :
string1 = "This text <thistag> is completely crap </thistag> because
<anothertag> blabla </anothertag>"
string2="This is another <thisnotag> node </thisnotag> with <anothertaggy>
random tags </anothertaggy>"

You say parse, but your description seems to say that you want to serialize the contents of an XML node, but without getting the outermost element. Is that correct?

In ET 1.3, you can do do this by setting the tag to None and then serializing the node as usual, but to do this in 1.2 (as shipped with Python 2.5), you need to process the string afterwards.

Assuming the element you want to serialize in the variable "node", you can do:

>>> node
<Element node at c770d0>
>>> s = ET.tostring(node)
>>> s
'<node>something some other thing <tag>hello</tag> text</node>'
>>> _, _, s = s.partition(">") # chop off first tag
>>> s, _, _ = s.rpartition("<") # chop off last tag
>>> s
'something some other thing <tag>hello</tag> text'
>>>

Alternatively, you can "normalize" the node and use ordinary slicing:

>>> node.tag = "node" # make sure we know what it is
>>> node.attrib.clear()
>>> s = ET.tostring()
>>> s = ET.tostring(node)
>>> s = s[6:-7]
>>> s
'something some other thing <tag>hello</tag> text'
>>>

</F>

_______________________________________________
XML-SIG maillist  -  XML-SIG@python.org
http://mail.python.org/mailman/listinfo/xml-sig

Reply via email to