[issue7637] Improve 19.5. xml.dom.minidom doc

Terry J. Reedy Mon, 04 Jan 2010 18:44:04 -0800

New submission from Terry J. Reedy <[email protected]>:

1. "When you are finished with a DOM, you should clean it up. This is necessary 
because some versions of Python do not support garbage collection of objects 
that refer to each other in a cycle. Until this restriction is removed from all 
versions of Python, it is safest to write your code as if cycles would not be 
cleaned up."


This appears to refer to early 2.x CPython versions without the gc module. Such 
(cryptic) back references are not appropriate for 3.x docs. Even in 3.x, 
immediate unlink might be a good idea, especially for CPython (which would then 
clean up immediately). But none of these issues are specific to DOM objects. 
Suggested replacement for the above and the current next sentence ("The way to 
clean up a DOM is to call its unlink() method:")

"When you are finished with a DOM, you can call the unlink method to encourage 
early cleanup of unneeded objects:"

Anything more is redundant with the doc for the method.
'''
dom1.unlink()
dom2.unlink()
dom3.unlink()
'''
One example at most is quite sufficient.

2. '''Node.toxml([encoding]) 
Return the XML that the DOM represents as a string.

With no argument, the XML header does not specify an encoding, and the result 
is Unicode string if the default encoding cannot represent all characters in 
the document. Encoding this string in an encoding other than UTF-8 is likely 
incorrect, since UTF-8 is the default encoding of XML.

With an explicit encoding [1] argument, the result is a byte string in the 
specified encoding. It is recommended that this argument is always specified. 
To avoid UnicodeError exceptions in case of unrepresentable text data, the 
encoding argument should be specified as “utf-8”.
'''
I find this API a bit confusing.

In 3.x, "Return ... a string." means str (unicode), but the rest implies that 
'string' should be 'string or bytes'.

"default encoding": what is it? ascii, utf-8 as almost implied, something in 
sys module (if so, please specify).

A cleaner API would have been 1. always return str (unicode) or 2. always 
return bytes, with encoding='utf-i' default or 3. return str if no encoding 
given or bytes if one is given, with no default.

3. Revision of following antipattern example would be for 2.x also:
'''
def getText(nodelist):
    rc = ""
    for node in nodelist:
        if node.nodeType == node.TEXT_NODE:
            rc = rc + node.data
    return rc
'''
should be (not tested, but pretty straightforward)

def getText(nodelist):
    rc = []
    for node in nodelist:
        if node.nodeType == node.TEXT_NODE:
            rc.append(node.data)
    return ''.join(rc)

----------
assignee: georg.brandl
components: Documentation
messages: 97244
nosy: georg.brandl, tjreedy
severity: normal
status: open
title: Improve 19.5. xml.dom.minidom doc
versions: Python 3.1, Python 3.2

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue7637>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue7637] Improve 19.5. xml.dom.minidom doc

Reply via email to