Hi.

I was writing an xmltv parser using python when I faced some weirdness that I couldn't explain.

What I'm doing, is read an xml file, create another dom object and copy the element from one to the other.

At no time do I ever modify the original dom object, yet it gets modified.

Unless I missed something, it sounds like a bug to me.

the xml file is simply:
<?xml version="1.0" encoding="utf-8"?>
<tv><channel id="id1"><display-name lang="en">full name</display-name></channel></tv>

which I store under the name test.xmltv

Here is the code, I've removed everything that isn't applicable to my description. can't make it any simpler I'm afraid:

from xml.dom.minidom import Document
import xml.dom.minidom


def adjusttimezone(docxml, timezone):
        doc = Document()
        
        # Create the <tv> base element
        tv_xml = doc.createElement("tv")
        doc.appendChild(tv_xml)

        #Create the channel list
        channellist = docxml.getElementsByTagName('channel')

        for x in channellist:
                #Copy the original attributes
                elem = doc.createElement("channel")
                for y in x.attributes.keys():
                        name = x.attributes[y].name
                        value = x.attributes[y].value
                        elem.setAttribute(name,value)
                for y in x.getElementsByTagName('display-name'):
                        elem.appendChild(y)
                tv_xml.appendChild(elem)
                        
        return doc

if __name__ == '__main__':
        handle = open('test.xmltv','r')
        docxml = xml.dom.minidom.parse(handle)
        print 'step1'
        print docxml.toprettyxml(indent="  ",encoding="utf-8")
        doc = adjusttimezone(docxml, 1000)
        print 'step2'
        print docxml.toprettyxml(indent="  ",encoding="utf-8")

Now at "step 1" I will display the content of the dom object, quite natually it shows:
<?xml version="1.0" encoding="utf-8"?>
<tv>
 <channel id="id1">
   <display-name lang="en">
     full name
   </display-name>
 </channel>
</tv>

After a call to adjusttimezone, "step 2" however will show:
<?xml version="1.0" encoding="utf-8"?>
<tv>
 <channel id="id1"/>
</tv>

That's it !

You'll note that at no time do I modify the content of docxml, yet it gets modified.

The weirdness disappear if I change the line
        channellist = docxml.getElementsByTagName('channel')
to
        channellist = copy.deepcopy(docxml.getElementsByTagName('channel'))

However, my understanding is that it shouldn't be necessary.

Any thoughts on this weirdness ?

Thanks
Jean-Yves

--
They who would give up an essential liberty for temporary security, deserve neither liberty or security (Benjamin Franklin)

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to