[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Stefan Behnel added the comment: The can store arbitrary objects sentence is now duplicated, and still way too visible. I have to read three sentences until it tells me what I need to know. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Stefan Behnel added the comment: I think the first two sentences can simply be removed to fix this, without loss of readability or information. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Ned Deily added the comment: Thanks for all of your contributions on this. I've committed a version along the lines I suggested along with Martin's example. -- resolution: - fixed stage: commit review - resolved status: open - closed type: behavior - ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Roundup Robot added the comment: New changeset d3cda8cf4d42 by Ned Deily in branch '2.7': Issue #24079: Improve description of the text and tail attributes for https://hg.python.org/cpython/rev/d3cda8cf4d42 New changeset ad0491f85050 by Ned Deily in branch '3.4': Issue #24079: Improve description of the text and tail attributes for https://hg.python.org/cpython/rev/ad0491f85050 New changeset 17ce3486fd8f by Ned Deily in branch '3.5': Issue #24079: merge from 3.4 https://hg.python.org/cpython/rev/17ce3486fd8f New changeset 3c94ece57c43 by Ned Deily in branch 'default': Issue #24079: merge from 3.5 https://hg.python.org/cpython/rev/3c94ece57c43 -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Robert Collins added the comment: So it is downplayed but it is still documented as being application usable. I'll give this another week for Ned to reply, then commit it in the absence of a reply: I think its ok as is. I'd be ok with a tweaked version along the lines Ned proposed too: both ways are better than whats in tree today. -- nosy: +rbcollins ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Martin Panter added the comment: I think Ned’s version is an acceptable solution (modulo some punctuation) to the original problem, although I do agree with Stefan that downplaying the generality would be even better. Perhaps we could add a qualifier, like “The *text* attribute [normally] holds . . .” -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Stefan Behnel added the comment: could we apply this patch, please? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Ned Deily added the comment: I note that the current wording for both text and tail are careful to allow for the most general use of the Element class, that is, that it may be used in non-XML contexts, for example: The text attribute can be used to hold additional data associated with the element. As the name implies this attribute is usually a string but may be any application-specific object. If the element is created from an XML file the attribute will contain any text found between the element tags. The proposed patch downplays that generality. How about modifying the original wording so that the description starts something like: These attributes can be used to hold additional [...] application-specific object. If the element is created from an XML file, the *text* attribute holds either the text between the element'sstart tag and its first child or end tag, or ``None``and the *tail* attribute holds either the text [...]. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Stefan Behnel added the comment: The proposed patch downplays that generality. That is completely intentional. Almost all readers of the documentation will first need to understand the difference between text and tail before they can go and think about any more advanced use cases that will almost certainly fail on their first serialisation attempts. The most important aim of the new phrasing is therefore to make that difference clear. Everything else is secondary, although still worth mentioning. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Changes by Martin Panter vadmium...@gmail.com: -- stage: patch review - commit review ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Stefan Behnel added the comment: Looks good to me. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Martin Panter added the comment: Okay, here is a version with most of the wording reverted to Jérôme’s suggestion. I only left my itertext() example, and the grouping of text and tail together. If there are any more bits that are incorrect or unclear please identify them. -- Added file: http://bugs.python.org/file39606/etree-text.v2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Stefan Behnel added the comment: IMHO less clear and less correct than the previous suggestions. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Stefan Behnel added the comment: Seems like a good idea to explain text and tail in one section, though. That makes tail easier to find for those who are not used to this kind of split (and that's basically everyone who needs to read the docs in the first place). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Martin Panter added the comment: Another problem with tostring() is that it seems you have to call it with encoding=unicode. Perhaps it would be better to suggest code like .join(element.itertext())? I would also improve on Jérôme’s version by making the None case more explicit. And perhaps both attributes can be defined together, rather than giving a half-hearted definition linking between them: .. attribute:: text .. attribute:: tail The *text* attribute holds any text between the element's begin tag and the next tag. The *tail* attribute holds any text between the element's end tag and the next tag. These attributes are set to ``None`` if there is no text. For example, in the XML data ``ab1c2d/3/c/b4/a``, the *a* element has ``None`` for both *text* and *tail* attributes, the *b* element has *text* ``1`` and *tail* ``4``, the *c* element has *text* ``2`` and *tail* ``None``, the *d* element has *text* ``None`` and *tail* ``3``. To collect the inner text of an element, use :meth:`itertext`, for example ``.join(element.itertext())``. Applications may store arbitrary objects in these attributes. -- nosy: +vadmium ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Raymond Hettinger added the comment: this is well formed xml and has nothing to do with tail. In fact, it does have something to do with tail. The 'TEXT' is a captured as the tail of element b: root3 = ET.fromstring('ab/TEXT/a') root3[0].tail 'TEXT' -- nosy: +eli.bendersky, rhettinger, scoder ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Stefan Behnel added the comment: I agree that the wording in the documentation isn't great: text The text attribute can be used to hold additional data associated with the element. As the name implies this attribute is usually a string but may be any application-specific object. If the element is created from an XML file the attribute will contain any text found between the element tags. tail The tail attribute can be used to hold additional data associated with the element. This attribute is usually a string but may be any application-specific object. If the element is created from an XML file the attribute will contain any text found after the element’s end tag and before the next tag. Special cases that no-one uses (sticking non-string objects into text/tail) are given too much space and the difference isn't explained as needed. Since the distinction between text and tail is a (great but) rather special feature of ElementTree, it needs to be given more room in the docs. Proposal: text The text attribute holds the immediate text content of the element. It contains any text found up to either the closing tag if the element has no children, or the next opening child tag within the element. For text following an element, see the `tail` attribute. To collect the entire text content of a subtree, see `tostring`. Applications may store arbitrary objects in this attribute. tail The tail attribute holds any text that directly follows the element. For example, in a document like ``aTextb/BTailc/CTail/a``, the `text` attribute of the ``a`` element holds the string Text, and the tail attributes of ``b`` and ``c`` hold the strings BTail and CTail respectively. Applications may store arbitrary objects in this attribute. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Jérôme Laurens added the comment: Since the text and tail notions seem tightly coupled, I would vote for a more detailed explanation in the text doc and a forward link in the tail documentation. text The text attribute holds the text between the element's begin tag and the next tag or None. The tail attribute holds the text between the element's end tag and the next tag or None. For ab1c2d/3/c/b4/a xml data, the a element has None for both text and tail attributes, the b element has text '1' and tail '4', the c element has text '2' and tail None, the d element hast text None and tail '3'. To collect the inner text of an element, see `tostring` with method 'text'. Applications may store arbitrary objects in this attribute. tail The tail attribute holds the text between the element's end tag and the next tag or None. See `text` for more details. Applications may store arbitrary objects in this attribute. It is very important to mention that the 'text' attribute does not always hold a string contrary to what would suggest its name. BTW, I was not aware of the tostring method with 'text' argument. The fact is that the documentation reads Returns an (optionally) encoded string containing the XML data. which is misleading because the text is not xml data in general. This also needs to be rephrased or simply removed. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Jérôme Laurens added the comment: Erratum def innertext(elt): return (elt.text or '') +''.join(innertext(e)+(e.tail or '') for e in elt) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Jérôme Laurens added the comment: The totsstring(..., method='text') is not suitable for the inner text because it adds the tail of the top element. A proper implementation would be def innertext(elt): return (elt.text or '') +''.join(innertext(e)+e.tail for e in elt) that can be included in the doc instead of the mention of the to string trick -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
New submission from Jérôme Laurens: The documentation for xml.etree.ElementTree.Element.text reads If the element is created from an XML file the attribute will contain any text found between the element tags. import xml.etree.ElementTree as ET root3 = ET.fromstring('ab/TEXT/a') print(root3.text) CURRENT OUTPUT None TEXT is between the elements tags but does not appear in the output BTW : this is well formed xml and has nothing to do with tail. -- components: XML messages: 242256 nosy: jlaurens priority: normal severity: normal status: open title: xml.etree.ElementTree.Element.text does not conform to the documentation type: behavior versions: Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Ned Deily added the comment: (This issue is a followup to your Issue24072.) Again, while the ElementTree documentation is certainly not nearly as complete as it should be, I don't think this is a documentation error per se. The key issue is: with which element is each text string associated? Perhaps this example will help: root4 = ET.fromstring('aATEXTbBTEXT/bBTAIL/a') root4 Element 'a' at 0x10224c228 root4.text 'ATEXT' root4.tail root4[0] Element 'b' at 0x1022ab278 root4[0].text 'BTEXT' root4[0].tail 'BTAIL' As in your original example, any text following the element b is associated with b's tail attribute until a new tag is found, pushing or popping the tree stack. While the description of the text attribute does not explicitly state this, the tail attribute description immediately following it does. This is also explained in more detail in the ElementTree resources on effbot.org that are linked to from the Python Standard Library documentation. Nevertheless, it probably would be helpful to expand the documentation on this point if someone is willing to put together a documentation patch for review. With regard to your comment about well formed xml, I don't think there is anything in the documentation that implies (or should imply) that the distinction between the text attribute and the tail attribute has anything to do with whether it is well-formed XML. The tutorial for the third-party lxml package, which provides another implementation of ElementTree, goes into more detail about why, in general, both text and tail are necessary. https://docs.python.org/3/library/xml.etree.elementtree.html#additional-resources http://effbot.org/zone/element.htm#text-content http://lxml.de/tutorial.html#elements-contain-text -- assignee: - docs@python components: +Documentation -XML nosy: +docs@python, ned.deily stage: - needs patch versions: +Python 2.7, Python 3.5 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com