[Proto-Scripty] Re: Cross-browser function for Text content
Hi Eric and TJ, thanks for your further research in this matter! Surprisingly, IE8 still doesn't support it, but even if it did, frankly it doesn't do what _I'd_ want. Does IE8 claim to be HTML5 compliant? $('test').textContent || $('test').innerText The only drawback I can think of is that sometime you may get an undefined instead of an empty string (which would be equivalent to false). If somebody really is annoyed by this drawback, he could write $('test').textContent || $('test').innerText || This solution is fine if you have a handful evaluations of text contents, as it will usually be the case. But anyway, a browser- independent abstraction would be nice so that this construct is only at one place in the code. This why I posted the question. As you point out in your next posting, an optimization would be to check the existence for the implementation of the property HTMLElement.textContent only once instead of with each text content evaluation. TJ, thanks for the solution propagating the DOM tree with the nice recursive textValueCollector() function. In the 90% case, only one descent will be necessary. This makes your solution even better: It works with satisfactory performance for complex text contents, and it returns quickly in the 90% case. Thanks and regards, Rüdiger -- You received this message because you are subscribed to the Google Groups Prototype script.aculo.us group. To post to this group, send email to prototype-scriptacul...@googlegroups.com. To unsubscribe from this group, send email to prototype-scriptaculous+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/prototype-scriptaculous?hl=en.
[Proto-Scripty] Re: Cross-browser function for Text content
Hi, On Apr 21, 3:38 pm, Rüdiger Plantiko ruediger.plant...@astrotexte.ch wrote: Surprisingly, IE8 still doesn't support it, but even if it did, frankly it doesn't do what _I'd_ want. Does IE8 claim to be HTML5 compliant? I doubt it (it would be quite a trick as the HTML5 working group hasn't even stopped accepting new proposals yet), but the second[1] of my textContent references was from six years ago. Microsoft's brought out two major revisions since then, and yet... If somebody really is annoyed by this drawback, he could write $('test').textContent || $('test').innerText || See Yuriy's links for why those aren't the same thing. :-) TJ, thanks for the solution propagating the DOM tree with the nice recursive textValueCollector() function. In the 90% case, only one descent will be necessary. This makes your solution even better: It works with satisfactory performance for complex text contents, and it returns quickly in the 90% case. Thanks. Yeah, actually, it's really quick even in the more complex case, too. I was surprised and pleased by that. :-) [1] http://www.w3.org/TR/DOM-Level-3-Core/core.html#Node3-textContent -- T.J. Crowder Independent Software Consultant tj / crowder software / com www.crowdersoftware.com On Apr 21, 3:38 pm, Rüdiger Plantiko ruediger.plant...@astrotexte.ch wrote: Hi Eric and TJ, thanks for your further research in this matter! Surprisingly, IE8 still doesn't support it, but even if it did, frankly it doesn't do what _I'd_ want. Does IE8 claim to be HTML5 compliant? $('test').textContent || $('test').innerText The only drawback I can think of is that sometime you may get an undefined instead of an empty string (which would be equivalent to false). If somebody really is annoyed by this drawback, he could write $('test').textContent || $('test').innerText || This solution is fine if you have a handful evaluations of text contents, as it will usually be the case. But anyway, a browser- independent abstraction would be nice so that this construct is only at one place in the code. This why I posted the question. As you point out in your next posting, an optimization would be to check the existence for the implementation of the property HTMLElement.textContent only once instead of with each text content evaluation. TJ, thanks for the solution propagating the DOM tree with the nice recursive textValueCollector() function. In the 90% case, only one descent will be necessary. This makes your solution even better: It works with satisfactory performance for complex text contents, and it returns quickly in the 90% case. Thanks and regards, Rüdiger -- You received this message because you are subscribed to the Google Groups Prototype script.aculo.us group. To post to this group, send email to prototype-scriptacul...@googlegroups.com. To unsubscribe from this group, send email to prototype-scriptaculous+unsubscr...@googlegroups.com. For more options, visit this group athttp://groups.google.com/group/prototype-scriptaculous?hl=en. -- You received this message because you are subscribed to the Google Groups Prototype script.aculo.us group. To post to this group, send email to prototype-scriptacul...@googlegroups.com. To unsubscribe from this group, send email to prototype-scriptaculous+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/prototype-scriptaculous?hl=en.
[Proto-Scripty] Re: Cross-browser function for Text content
Interestingly, Firefox's `textContent` behavior of including the script element's contents (which I called insanity) is *standard* as far as I can tell -- and has been for years: http://www.w3.org/TR/html5/infrastructure.html#textcontent http://www.w3.org/TR/DOM-Level-3-Core/core.html#Node3-textContent Surprisingly, IE8 still doesn't support it, but even if it did, frankly it doesn't do what _I'd_ want. -- T.J. On Apr 13, 4:29 pm, T.J. Crowder t...@crowdersoftware.com wrote: Hi, On Apr 13, 2:42 pm, kangax kan...@gmail.com wrote: We've been getting these requests in the past. Take a look at, for example: URL:http://groups.google.com/group/prototype-core/browse_thread/thread/8e... I still think that it's not a trivial solution (for the reasons outlined in the post linked above)... Oh my sweet Lord in heaven! he exclaimed, after reading the stackoverflow answer linked from the above and seeing all of the myriad inconsistencies. Blech. That's useful to know, thanks. jQuery seems to go with the collect-all-text-nodes answer, completely ignoring innerText and textContext, presumably for these very reasons. It also fails to strip the content of script tags (like FF's textContext does), which seems odd (doesn't look very optimized, either, but perhaps it's fast enough without). But I would argue that these reasons are exactly why Prototype should have this feature. This is Prototype's raison d'etre, smoothing out various browser differences (and outright insanities, such as including script contents!). I coded up a simple text node gatherer[1] that omits the contents of script elements, and the performance isn't bad at all. Even using the slowest major browser, it happily gave me the 12k of text content in a moderately-complex page (various menus and controls, plus a 580-row 3- column table containing links) in about a third of a second on my little Atom-class netbook. I created a bookmarklet[2] of it that reports character count, time, and such and ran it against a large, complex document (the current all- in-one-page HTML5 specification[3]) using Chrome, which gave me all 2,090,693 characters spread across 86,018 elements (I didn't count all nodes, just elements) in just under two seconds (again on the netbook). Firefox did the same in just under three seconds, and IE7 (after taking several *minutes* -- and several script errors -- just to load the document) ran the bookmarklet in 12.5 seconds. Pretty decent for IE. :-) The character counts were identical between Chrome and Firefox; IE saw slightly fewer characters (1,891,293) and elements (85,972), but that could have been down to the script errors. Firefox reported one fewer element than Chrome. I haven't particularly tested or optimized that code, it's just a starting point. It builds things up in an array and uses #join at the end, which is probably slower for small tasks than jQuery's approach (string concatenation), but probably faster for large tasks (like the HTML spec). I say probably in each case because I haven't tested, and I've learned not to make performance assertions without data. :-) [1]http://pastie.org/917566(also quoted inline below) [2]http://pastie.org/917567 [3]http://www.w3.org/TR/html5/Overview.html(warning: *LARGE* document) Code from [1] pasted inline: * * * * Element.addMethods((function() { /** * Element.textValue() - String * * Gets the text within the element, ignoring any tags; e.g., returns the sum of all of the * text nodes. Omits the text nodes within `script` elements. **/ function textValue(element) { if (!(element = $(element))) return; var collector = []; textValueCollector(element, collector); return collector.join(); } function textValueCollector(element, collector) { var node; for (node = element.firstChild; node; node = node.nextSibling) { switch (node.nodeType) { case 3: // text case 4: // cdata collector.push(node.nodeValue); break; case 8: // comment break; case 1: // element if (node.tagName == 'SCRIPT') { break; } // FALL THROUGH TO DEFAULT default: // Descend textValueCollector(node, collector); break; } } } return {textValue: textValue};})()); * * * * -- T.J. :-) On Apr 13, 2:42 pm, kangax kan...@gmail.com wrote: We've been getting these requests in the past. Take a look at, for example: URL:http://groups.google.com/group/prototype-core/browse_thread/thread/8e... I still think that it's not a trivial solution (for the reasons outlined in the post linked above) and so is best
[Proto-Scripty] Re: Cross-browser function for Text content
What I use in this case is: $('test').textContent || $('test').innerText If textContent is defined, it is used. if it isn't, innerText is used. This would be a lot faster than stripping tags. The only drawback I can think of is that sometime you may get an undefined instead of an empty string (which would be equivalent to false). Eric On Apr 12, 9:35 pm, Rüdiger Plantiko ruediger.plant...@astrotexte.ch wrote: Hi TJ, I get the number 4711 in IE with $(test).innerText and in FF with $ (test).textContent - does Prototype provide a browser-independent abstraction for this? Hopefully you get the *string* 4711 rather than the number 4711 (unless you parse it). :-) You are right, in a posting every word is important, in order to avoid misunderstandings. So, yes: I am getting the string, not the number. ... innerHTML ... yeah, if the document structure guarantees to me that the element in question only contains a text node, then I could use innerHTML equivalently to innerText/textContent. Element.addMethods({ text: function(element) { if (!(element = $(element))) return; return element.innerHTML.stripTags(); } }); thanks for the reference to String.stripTags() - I hadn't realized the existence of such a function before. - Regards, Rüdiger -- You received this message because you are subscribed to the Google Groups Prototype script.aculo.us group. To post to this group, send email to prototype-scriptacul...@googlegroups.com. To unsubscribe from this group, send email to prototype-scriptaculous+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/prototype-scriptaculous?hl=en.
[Proto-Scripty] Re: Cross-browser function for Text content
On Apr 12, 7:04 pm, T.J. Crowder t...@crowdersoftware.com wrote: Element.addMethods({ text: function(element) { if (!(element = $(element))) return; return element.innerHTML.stripTags(); } }); wouldn't it be wiser to check for the native method once and use it? Something like (untested) Element.addMethods({ text: function(element) { if (!(element = $(element))) return; return element.innerHTML.stripTags(); } if($$('BODY').first().textContent===undefined) { } -- You received this message because you are subscribed to the Google Groups Prototype script.aculo.us group. To post to this group, send email to prototype-scriptacul...@googlegroups.com. To unsubscribe from this group, send email to prototype-scriptaculous+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/prototype-scriptaculous?hl=en.
[Proto-Scripty] Re: Cross-browser function for Text content
Oooops, gmail sent the message before I finished... :o) Here is the correct message (please ignore the previous one) On Apr 12, 7:04 pm, T.J. Crowder t...@crowdersoftware.com wrote: Element.addMethods({ text: function(element) { if (!(element = $(element))) return; return element.innerHTML.stripTags(); } }); wouldn't it be wiser to check for the native method once and use it? Something like (untested) Element.addMethods({ text: ($$('BODY').first().textContent===undefined) ? function(element) { if (!(element = $(element))) return; return element.innerText; } : function(element) { if (!(element = $(element))) return; return element.textContent; } }); Eric NB: I know, the testing condition is ugly... feel free to post a better one :o) -- You received this message because you are subscribed to the Google Groups Prototype script.aculo.us group. To post to this group, send email to prototype-scriptacul...@googlegroups.com. To unsubscribe from this group, send email to prototype-scriptaculous+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/prototype-scriptaculous?hl=en.
[Proto-Scripty] Re: Cross-browser function for Text content
On Apr 13, 10:39 am, Eric lefauv...@gmail.com wrote: wouldn't it be wiser to check for the native method once and use it? Probably. I'd also check for innerText (in fact, I'd check for that first), since it's supported by IE, WebKit (so Chrome, Safari), and Opera; only Mozilla holds out. textContent is supported by all of them except IE. So: Element.addMethods((function() { return { /** * Element.text() - String * * Gets the text within the element, ignoring any tags (essentially the sum of all of the * text nodes within). **/ text: (function() { var element, testvalue; element = document.createElement(span); element.innerHTML = testvalue = foo; if (text_fromInnerText(element) == testvalue) { return text_fromInnerText; } if (text_fromTextContent(element) == testvalue) { return text_fromTextContent; } return text_fromStripping; })() }; // Get the element's inner text via innerText if available (IE, WebKit, Opera, ...) function text_fromInnerText(element) { if (!(element = $(element))) return; return element.innerText; } // Get the element's inner text via textContent if available (Gecko, WebKit, Opera, ...) function text_fromTextContent(element) { if (!(element = $(element))) return; return element.textContent; } // Get the element's inner text by getting innerHTML and stripping tags (fallback) function text_fromStripping(element) { if (!(element = $(element))) return; return element.innerHTML.stripTags(); } })()); Do people think I should submit this to core? jQuery has an equivalent function, and I think I saw one in Closure as well. So it's not just the OP who wants to do this... -- T.J. :-) On Apr 13, 10:39 am, Eric lefauv...@gmail.com wrote: Oooops, gmail sent the message before I finished... :o) Here is the correct message (please ignore the previous one) On Apr 12, 7:04 pm, T.J. Crowder t...@crowdersoftware.com wrote: Element.addMethods({ text: function(element) { if (!(element = $(element))) return; return element.innerHTML.stripTags(); } }); wouldn't it be wiser to check for the native method once and use it? Something like (untested) Element.addMethods({ text: ($$('BODY').first().textContent===undefined) ? function(element) { if (!(element = $(element))) return; return element.innerText; } : function(element) { if (!(element = $(element))) return; return element.textContent; } }); Eric NB: I know, the testing condition is ugly... feel free to post a better one :o) -- You received this message because you are subscribed to the Google Groups Prototype script.aculo.us group. To post to this group, send email to prototype-scriptacul...@googlegroups.com. To unsubscribe from this group, send email to prototype-scriptaculous+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/prototype-scriptaculous?hl=en.
[Proto-Scripty] Re: Cross-browser function for Text content
We've been getting these requests in the past. Take a look at, for example: URL: http://groups.google.com/group/prototype-core/browse_thread/thread/8ef26e7cedb43afc/47033b4bc8dc4c74#47033b4bc8dc4c74 I still think that it's not a trivial solution (for the reasons outlined in the post linked above) and so is best handled by a standalone plugin. And using context-unaware `stripTags` on something like HTML is usually asking for trouble :) (imagine what stripTags would do to a string like this — foo bar scriptfunction wrap(html) { return 'div' + html + '/div'}/script baz; and then there are other elements with CDATA content model, like STYLE) -- kangax On Apr 13, 8:20 am, T.J. Crowder t...@crowdersoftware.com wrote: On Apr 13, 10:39 am, Eric lefauv...@gmail.com wrote: wouldn't it be wiser to check for the native method once and use it? Probably. I'd also check for innerText (in fact, I'd check for that first), since it's supported by IE, WebKit (so Chrome, Safari), and Opera; only Mozilla holds out. textContent is supported by all of them except IE. So: Element.addMethods((function() { return { /** * Element.text() - String * * Gets the text within the element, ignoring any tags (essentially the sum of all of the * text nodes within). **/ text: (function() { var element, testvalue; element = document.createElement(span); element.innerHTML = testvalue = foo; if (text_fromInnerText(element) == testvalue) { return text_fromInnerText; } if (text_fromTextContent(element) == testvalue) { return text_fromTextContent; } return text_fromStripping; })() }; // Get the element's inner text via innerText if available (IE, WebKit, Opera, ...) function text_fromInnerText(element) { if (!(element = $(element))) return; return element.innerText; } // Get the element's inner text via textContent if available (Gecko, WebKit, Opera, ...) function text_fromTextContent(element) { if (!(element = $(element))) return; return element.textContent; } // Get the element's inner text by getting innerHTML and stripping tags (fallback) function text_fromStripping(element) { if (!(element = $(element))) return; return element.innerHTML.stripTags(); } })()); Do people think I should submit this to core? jQuery has an equivalent function, and I think I saw one in Closure as well. So it's not just the OP who wants to do this... -- T.J. :-) On Apr 13, 10:39 am, Eric lefauv...@gmail.com wrote: Oooops, gmail sent the message before I finished... :o) Here is the correct message (please ignore the previous one) On Apr 12, 7:04 pm, T.J. Crowder t...@crowdersoftware.com wrote: Element.addMethods({ text: function(element) { if (!(element = $(element))) return; return element.innerHTML.stripTags(); } }); wouldn't it be wiser to check for the native method once and use it? Something like (untested) Element.addMethods({ text: ($$('BODY').first().textContent===undefined) ? function(element) { if (!(element = $(element))) return; return element.innerText; } : function(element) { if (!(element = $(element))) return; return element.textContent; } }); Eric NB: I know, the testing condition is ugly... feel free to post a better one :o) -- You received this message because you are subscribed to the Google Groups Prototype script.aculo.us group. To post to this group, send email to prototype-scriptacul...@googlegroups.com. To unsubscribe from this group, send email to prototype-scriptaculous+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/prototype-scriptaculous?hl=en.
[Proto-Scripty] Re: Cross-browser function for Text content
Hi, On Apr 13, 2:42 pm, kangax kan...@gmail.com wrote: We've been getting these requests in the past. Take a look at, for example: URL:http://groups.google.com/group/prototype-core/browse_thread/thread/8e... I still think that it's not a trivial solution (for the reasons outlined in the post linked above)... Oh my sweet Lord in heaven! he exclaimed, after reading the stackoverflow answer linked from the above and seeing all of the myriad inconsistencies. Blech. That's useful to know, thanks. jQuery seems to go with the collect-all-text-nodes answer, completely ignoring innerText and textContext, presumably for these very reasons. It also fails to strip the content of script tags (like FF's textContext does), which seems odd (doesn't look very optimized, either, but perhaps it's fast enough without). But I would argue that these reasons are exactly why Prototype should have this feature. This is Prototype's raison d'etre, smoothing out various browser differences (and outright insanities, such as including script contents!). I coded up a simple text node gatherer[1] that omits the contents of script elements, and the performance isn't bad at all. Even using the slowest major browser, it happily gave me the 12k of text content in a moderately-complex page (various menus and controls, plus a 580-row 3- column table containing links) in about a third of a second on my little Atom-class netbook. I created a bookmarklet[2] of it that reports character count, time, and such and ran it against a large, complex document (the current all- in-one-page HTML5 specification[3]) using Chrome, which gave me all 2,090,693 characters spread across 86,018 elements (I didn't count all nodes, just elements) in just under two seconds (again on the netbook). Firefox did the same in just under three seconds, and IE7 (after taking several *minutes* -- and several script errors -- just to load the document) ran the bookmarklet in 12.5 seconds. Pretty decent for IE. :-) The character counts were identical between Chrome and Firefox; IE saw slightly fewer characters (1,891,293) and elements (85,972), but that could have been down to the script errors. Firefox reported one fewer element than Chrome. I haven't particularly tested or optimized that code, it's just a starting point. It builds things up in an array and uses #join at the end, which is probably slower for small tasks than jQuery's approach (string concatenation), but probably faster for large tasks (like the HTML spec). I say probably in each case because I haven't tested, and I've learned not to make performance assertions without data. :-) [1] http://pastie.org/917566 (also quoted inline below) [2] http://pastie.org/917567 [3] http://www.w3.org/TR/html5/Overview.html (warning: *LARGE* document) Code from [1] pasted inline: * * * * Element.addMethods((function() { /** * Element.textValue() - String * * Gets the text within the element, ignoring any tags; e.g., returns the sum of all of the * text nodes. Omits the text nodes within `script` elements. **/ function textValue(element) { if (!(element = $(element))) return; var collector = []; textValueCollector(element, collector); return collector.join(); } function textValueCollector(element, collector) { var node; for (node = element.firstChild; node; node = node.nextSibling) { switch (node.nodeType) { case 3: // text case 4: // cdata collector.push(node.nodeValue); break; case 8: // comment break; case 1: // element if (node.tagName == 'SCRIPT') { break; } // FALL THROUGH TO DEFAULT default: // Descend textValueCollector(node, collector); break; } } } return {textValue: textValue}; })()); * * * * -- T.J. :-) On Apr 13, 2:42 pm, kangax kan...@gmail.com wrote: We've been getting these requests in the past. Take a look at, for example: URL:http://groups.google.com/group/prototype-core/browse_thread/thread/8e... I still think that it's not a trivial solution (for the reasons outlined in the post linked above) and so is best handled by a standalone plugin. And using context-unaware `stripTags` on something like HTML is usually asking for trouble :) (imagine what stripTags would do to a string like this — foo bar scriptfunction wrap(html) { return 'div' + html + '/div'}/script baz; and then there are other elements with CDATA content model, like STYLE) -- kangax On Apr 13, 8:20 am, T.J. Crowder t...@crowdersoftware.com wrote: On Apr 13, 10:39 am, Eric lefauv...@gmail.com wrote: wouldn't it be wiser to check for the native method once and use it? Probably. I'd also check for
[Proto-Scripty] Re: Cross-browser function for Text content
Hi, I get the number 4711 in IE with $(test).innerText and in FF with $ (test).textContent - does Prototype provide a browser-independent abstraction for this? Hopefully you get the *string* 4711 rather than the number 4711 (unless you parse it). :-) `innerHTML` works on all major browsers. It was introduced by IE back in v5 or so, supported by every major browser, and is now standardized in the HTML5 stuff. Of course, it returns the HTML rather than the text, so given: span id=testem4711/em/span $('test').innerHTML will return em4711/em. If you want just the text with the tags stripped away, Prototype adds `stripTags`[1] to the `String` prototype, so $('test').innerHTML.stripTags() will return 4711. If you wanted to shorten that a bit, you could add a `text` function via Element.addMethods[2]: Element.addMethods({ text: function(element) { if (!(element = $(element))) return; return element.innerHTML.stripTags(); } }); (That's off-the-cuff, but I think it's correct; more on the addMethods page.) [1] http://api.prototypejs.org/language/string/prototype/striptags/ [2] http://api.prototypejs.org/dom/element/addmethods/ HTH, -- T.J. Crowder Independent Software Consultant tj / crowder software / com www.crowdersoftware.com On Apr 12, 2:05 pm, Rüdiger Plantiko ruediger.plant...@astrotexte.ch wrote: Hi there, is there a cross-browser function for retrieving the text content of an element? If I have an element like span id=test4711/span I get the number 4711 in IE with $(test).innerText and in FF with $ (test).textContent - does Prototype provide a browser-independent abstraction for this? Regards, Rüdiger -- You received this message because you are subscribed to the Google Groups Prototype script.aculo.us group. To post to this group, send email to prototype-scriptacul...@googlegroups.com. To unsubscribe from this group, send email to prototype-scriptaculous+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/prototype-scriptaculous?hl=en.
[Proto-Scripty] Re: Cross-browser function for Text content
Hi TJ, I get the number 4711 in IE with $(test).innerText and in FF with $ (test).textContent - does Prototype provide a browser-independent abstraction for this? Hopefully you get the *string* 4711 rather than the number 4711 (unless you parse it). :-) You are right, in a posting every word is important, in order to avoid misunderstandings. So, yes: I am getting the string, not the number. ... innerHTML ... yeah, if the document structure guarantees to me that the element in question only contains a text node, then I could use innerHTML equivalently to innerText/textContent. Element.addMethods({ text: function(element) { if (!(element = $(element))) return; return element.innerHTML.stripTags(); } }); thanks for the reference to String.stripTags() - I hadn't realized the existence of such a function before. - Regards, Rüdiger -- You received this message because you are subscribed to the Google Groups Prototype script.aculo.us group. To post to this group, send email to prototype-scriptacul...@googlegroups.com. To unsubscribe from this group, send email to prototype-scriptaculous+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/prototype-scriptaculous?hl=en.