Interestingly, Firefox's `textContent` behavior of including the script element's contents (which I called "insanity") is *standard* as far as I can tell -- and has been for years:
http://www.w3.org/TR/html5/infrastructure.html#textcontent http://www.w3.org/TR/DOM-Level-3-Core/core.html#Node3-textContent Surprisingly, IE8 still doesn't support it, but even if it did, frankly it doesn't do what _I'd_ want. -- T.J. On Apr 13, 4:29 pm, "T.J. Crowder" <t...@crowdersoftware.com> wrote: > Hi, > > On Apr 13, 2:42 pm, kangax <kan...@gmail.com> wrote: > > > We've been getting these requests in the past. Take a look at, for > > example: > > <URL:http://groups.google.com/group/prototype-core/browse_thread/thread/8e...> > > > I still think that it's not a trivial solution (for the reasons > > outlined in the post linked above)... > > "Oh my sweet Lord in heaven!" he exclaimed, after reading the > stackoverflow answer linked from the above and seeing all of the > myriad inconsistencies. > > Blech. That's useful to know, thanks. > > jQuery seems to go with the collect-all-text-nodes answer, completely > ignoring innerText and textContext, presumably for these very reasons. > It also fails to strip the content of script tags (like FF's > textContext does), which seems odd (doesn't look very optimized, > either, but perhaps it's fast enough without). > > But I would argue that these reasons are exactly why Prototype should > have this feature. This is Prototype's raison d'etre, smoothing out > various browser differences (and outright insanities, such as > including script contents!). > > I coded up a simple text node gatherer[1] that omits the contents of > script elements, and the performance isn't bad at all. Even using the > slowest major browser, it happily gave me the 12k of text content in a > moderately-complex page (various menus and controls, plus a 580-row 3- > column table containing links) in about a third of a second on my > little Atom-class netbook. > > I created a bookmarklet[2] of it that reports character count, time, > and such and ran it against a large, complex document (the current all- > in-one-page HTML5 specification[3]) using Chrome, which gave me all > 2,090,693 characters spread across 86,018 elements (I didn't count all > nodes, just elements) in just under two seconds (again on the > netbook). Firefox did the same in just under three seconds, and IE7 > (after taking several *minutes* -- and several script errors -- just > to load the document) ran the bookmarklet in 12.5 seconds. Pretty > decent for IE. :-) The character counts were identical between Chrome > and Firefox; IE saw slightly fewer characters (1,891,293) and elements > (85,972), but that could have been down to the script errors. Firefox > reported one fewer element than Chrome. > > I haven't particularly tested or optimized that code, it's just a > starting point. It builds things up in an array and uses #join at the > end, which is probably slower for small tasks than jQuery's approach > (string concatenation), but probably faster for large tasks (like the > HTML spec). I say "probably" in each case because I haven't tested, > and I've learned not to make performance assertions without data. :-) > > [1]http://pastie.org/917566(also quoted inline below) > [2]http://pastie.org/917567 > [3]http://www.w3.org/TR/html5/Overview.html(warning: *LARGE* > document) > > Code from [1] pasted inline: > * * * * > Element.addMethods((function() { > > /** > * Element.textValue() -> String > * > * Gets the text within the element, ignoring any tags; e.g., > returns the sum of all of the > * text nodes. Omits the text nodes within `script` elements. > **/ > function textValue(element) { > if (!(element = $(element))) return; > var collector = []; > textValueCollector(element, collector); > return collector.join(""); > } > function textValueCollector(element, collector) { > var node; > > for (node = element.firstChild; node; node = node.nextSibling) > { > switch (node.nodeType) { > case 3: // text > case 4: // cdata > collector.push(node.nodeValue); > break; > case 8: // comment > break; > case 1: // element > if (node.tagName == 'SCRIPT') { > break; > } > // FALL THROUGH TO DEFAULT > default: > // Descend > textValueCollector(node, collector); > break; > } > } > } > > return {textValue: textValue};})()); > > * * * * > > -- T.J. :-) > > On Apr 13, 2:42 pm, kangax <kan...@gmail.com> wrote: > > > > > We've been getting these requests in the past. Take a look at, for > > example: > > <URL:http://groups.google.com/group/prototype-core/browse_thread/thread/8e...> > > > I still think that it's not a trivial solution (for the reasons > > outlined in the post linked above) and so is best handled by a > > standalone plugin. And using context-unaware `stripTags` on something > > like HTML is usually asking for trouble :) (imagine what stripTags > > would do to a string like this — "foo bar <script>function wrap(html) > > { return 'div' + html + '</div>'}</script> baz"; and then there are > > other elements with CDATA content model, like STYLE) > > > -- > > kangax > > > On Apr 13, 8:20 am, "T.J. Crowder" <t...@crowdersoftware.com> wrote: > > > > On Apr 13, 10:39 am, Eric <lefauv...@gmail.com> wrote: > > > > > wouldn't it be wiser to check for the native method once and use it? > > > > Probably. I'd also check for innerText (in fact, I'd check for that > > > first), since it's supported by IE, WebKit (so Chrome, Safari), and > > > Opera; only Mozilla holds out. textContent is supported by all of them > > > except IE. So: > > > > Element.addMethods((function() { > > > > return { > > > /** > > > * Element.text() -> String > > > * > > > * Gets the text within the element, ignoring any tags > > > (essentially the sum of all of the > > > * text nodes within). > > > **/ > > > text: (function() { > > > var element, testvalue; > > > > element = document.createElement("span"); > > > element.innerHTML = testvalue = "foo"; > > > if (text_fromInnerText(element) == testvalue) { > > > return text_fromInnerText; > > > } > > > if (text_fromTextContent(element) == testvalue) { > > > return text_fromTextContent; > > > } > > > return text_fromStripping; > > > })() > > > }; > > > > // Get the element's inner text via innerText if available (IE, > > > WebKit, Opera, ...) > > > function text_fromInnerText(element) { > > > if (!(element = $(element))) return; > > > return element.innerText; > > > } > > > > // Get the element's inner text via textContent if available > > > (Gecko, WebKit, Opera, ...) > > > function text_fromTextContent(element) { > > > if (!(element = $(element))) return; > > > return element.textContent; > > > } > > > > // Get the element's inner text by getting innerHTML and stripping > > > tags (fallback) > > > function text_fromStripping(element) { > > > if (!(element = $(element))) return; > > > return element.innerHTML.stripTags(); > > > } > > > > })()); > > > > Do people think I should submit this to core? jQuery has an equivalent > > > function, and I think I saw one in Closure as well. So it's not just > > > the OP who wants to do this... > > > > -- T.J. :-) > > > > On Apr 13, 10:39 am, Eric <lefauv...@gmail.com> wrote: > > > > > Oooops, gmail sent the message before I finished... :o) > > > > > Here is the correct message (please ignore the previous one) > > > > > On Apr 12, 7:04 pm, "T.J. Crowder" <t...@crowdersoftware.com> wrote: > > > > > > Element.addMethods({ > > > > > text: function(element) { > > > > > if (!(element = $(element))) return; > > > > > return element.innerHTML.stripTags(); > > > > > } > > > > > }); > > > > > wouldn't it be wiser to check for the native method once and use it? > > > > > Something like (untested) > > > > > Element.addMethods({ > > > > text: ($$('BODY').first().textContent===undefined) > > > > ? function(element) { if (!(element = $(element))) return; > > > > return element.innerText; } > > > > : function(element) { if (!(element = $(element))) return; > > > > return element.textContent; } > > > > > }); > > > > > Eric > > > > > NB: I know, the testing condition is ugly... feel free to post a > > > > better one :o) -- You received this message because you are subscribed to the Google Groups "Prototype & script.aculo.us" group. To post to this group, send email to prototype-scriptacul...@googlegroups.com. To unsubscribe from this group, send email to prototype-scriptaculous+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/prototype-scriptaculous?hl=en.