New submission from Terry J. Reedy <[email protected]>:
1. "When you are finished with a DOM, you should clean it up. This is necessary
because some versions of Python do not support garbage collection of objects
that refer to each other in a cycle. Until this restriction is removed from all
versions of Python, it is safest to write your code as if cycles would not be
cleaned up."
This appears to refer to early 2.x CPython versions without the gc module. Such
(cryptic) back references are not appropriate for 3.x docs. Even in 3.x,
immediate unlink might be a good idea, especially for CPython (which would then
clean up immediately). But none of these issues are specific to DOM objects.
Suggested replacement for the above and the current next sentence ("The way to
clean up a DOM is to call its unlink() method:")
"When you are finished with a DOM, you can call the unlink method to encourage
early cleanup of unneeded objects:"
Anything more is redundant with the doc for the method.
'''
dom1.unlink()
dom2.unlink()
dom3.unlink()
'''
One example at most is quite sufficient.
2. '''Node.toxml([encoding])
Return the XML that the DOM represents as a string.
With no argument, the XML header does not specify an encoding, and the result
is Unicode string if the default encoding cannot represent all characters in
the document. Encoding this string in an encoding other than UTF-8 is likely
incorrect, since UTF-8 is the default encoding of XML.
With an explicit encoding [1] argument, the result is a byte string in the
specified encoding. It is recommended that this argument is always specified.
To avoid UnicodeError exceptions in case of unrepresentable text data, the
encoding argument should be specified as “utf-8”.
'''
I find this API a bit confusing.
In 3.x, "Return ... a string." means str (unicode), but the rest implies that
'string' should be 'string or bytes'.
"default encoding": what is it? ascii, utf-8 as almost implied, something in
sys module (if so, please specify).
A cleaner API would have been 1. always return str (unicode) or 2. always
return bytes, with encoding='utf-i' default or 3. return str if no encoding
given or bytes if one is given, with no default.
3. Revision of following antipattern example would be for 2.x also:
'''
def getText(nodelist):
rc = ""
for node in nodelist:
if node.nodeType == node.TEXT_NODE:
rc = rc + node.data
return rc
'''
should be (not tested, but pretty straightforward)
def getText(nodelist):
rc = []
for node in nodelist:
if node.nodeType == node.TEXT_NODE:
rc.append(node.data)
return ''.join(rc)
----------
assignee: georg.brandl
components: Documentation
messages: 97244
nosy: georg.brandl, tjreedy
severity: normal
status: open
title: Improve 19.5. xml.dom.minidom doc
versions: Python 3.1, Python 3.2
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue7637>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com