[Proto-Scripty] Re: Cross-browser function for Text content

T.J. Crowder Thu, 15 Apr 2010 06:36:08 -0700

Interestingly, Firefox's `textContent` behavior of including the
script element's contents (which I called "insanity") is *standard* as
far as I can tell -- and has been for years:


http://www.w3.org/TR/html5/infrastructure.html#textcontent
http://www.w3.org/TR/DOM-Level-3-Core/core.html#Node3-textContent

Surprisingly, IE8 still doesn't support it, but even if it did,
frankly it doesn't do what _I'd_ want.

-- T.J.

On Apr 13, 4:29 pm, "T.J. Crowder" <t...@crowdersoftware.com> wrote:
> Hi,
>
> On Apr 13, 2:42 pm, kangax <kan...@gmail.com> wrote:
>
> > We've been getting these requests in the past. Take a look at, for
> > example: 
> > <URL:http://groups.google.com/group/prototype-core/browse_thread/thread/8e...>
>
> > I still think that it's not a trivial solution (for the reasons
> > outlined in the post linked above)...
>
> "Oh my sweet Lord in heaven!" he exclaimed, after reading the
> stackoverflow answer linked from the above and seeing all of the
> myriad inconsistencies.
>
> Blech. That's useful to know, thanks.
>
> jQuery seems to go with the collect-all-text-nodes answer, completely
> ignoring innerText and textContext, presumably for these very reasons.
> It also fails to strip the content of script tags (like FF's
> textContext does), which seems odd (doesn't look very optimized,
> either, but perhaps it's fast enough without).
>
> But I would argue that these reasons are exactly why Prototype should
> have this feature. This is Prototype's raison d'etre, smoothing out
> various browser differences (and outright insanities, such as
> including script contents!).
>
> I coded up a simple text node gatherer[1] that omits the contents of
> script elements, and the performance isn't bad at all. Even using the
> slowest major browser, it happily gave me the 12k of text content in a
> moderately-complex page (various menus and controls, plus a 580-row 3-
> column table containing links) in about a third of a second on my
> little Atom-class netbook.
>
> I created a bookmarklet[2] of it that reports character count, time,
> and such and ran it against a large, complex document (the current all-
> in-one-page HTML5 specification[3]) using Chrome, which gave me all
> 2,090,693 characters spread across 86,018 elements (I didn't count all
> nodes, just elements) in just under two seconds (again on the
> netbook). Firefox did the same in just under three seconds, and IE7
> (after taking several *minutes* -- and several script errors -- just
> to load the document) ran the bookmarklet in 12.5 seconds. Pretty
> decent for IE. :-) The character counts were identical between Chrome
> and Firefox; IE saw slightly fewer characters (1,891,293) and elements
> (85,972), but that could have been down to the script errors. Firefox
> reported one fewer element than Chrome.
>
> I haven't particularly tested or optimized that code, it's just a
> starting point. It builds things up in an array and uses #join at the
> end, which is probably slower for small tasks than jQuery's approach
> (string concatenation), but probably faster for large tasks (like the
> HTML spec). I say "probably" in each case because I haven't tested,
> and I've learned not to make performance assertions without data. :-)
>
> [1]http://pastie.org/917566(also quoted inline below)
> [2]http://pastie.org/917567
> [3]http://www.w3.org/TR/html5/Overview.html(warning: *LARGE*
> document)
>
> Code from [1] pasted inline:
> * * * *
> Element.addMethods((function() {
>
>     /**
>      * Element.textValue() -> String
>      *
>      * Gets the text within the element, ignoring any tags; e.g.,
> returns the sum of all of the
>      * text nodes. Omits the text nodes within `script` elements.
>     **/
>     function textValue(element) {
>         if (!(element = $(element))) return;
>         var collector = [];
>         textValueCollector(element, collector);
>         return collector.join("");
>     }
>     function textValueCollector(element, collector) {
>         var node;
>
>         for (node = element.firstChild; node; node = node.nextSibling)
> {
>             switch (node.nodeType) {
>                 case 3: // text
>                 case 4: // cdata
>                     collector.push(node.nodeValue);
>                     break;
>                 case 8: // comment
>                     break;
>                 case 1: // element
>                     if (node.tagName == 'SCRIPT') {
>                         break;
>                     }
>                     // FALL THROUGH TO DEFAULT
>                 default:
>                     // Descend
>                     textValueCollector(node, collector);
>                     break;
>             }
>         }
>     }
>
>     return {textValue: textValue};})());
>
> * * * *
>
> -- T.J. :-)
>
> On Apr 13, 2:42 pm, kangax <kan...@gmail.com> wrote:
>
>
>
> > We've been getting these requests in the past. Take a look at, for
> > example: 
> > <URL:http://groups.google.com/group/prototype-core/browse_thread/thread/8e...>
>
> > I still think that it's not a trivial solution (for the reasons
> > outlined in the post linked above) and so is best handled by a
> > standalone plugin. And using context-unaware `stripTags` on something
> > like HTML is usually asking for trouble :) (imagine what stripTags
> > would do to a string like this — "foo bar <script>function wrap(html)
> > { return 'div' + html + '</div>'}</script> baz"; and then there are
> > other elements with CDATA content model, like STYLE)
>
> > --
> > kangax
>
> > On Apr 13, 8:20 am, "T.J. Crowder" <t...@crowdersoftware.com> wrote:
>
> > > On Apr 13, 10:39 am, Eric <lefauv...@gmail.com> wrote:
>
> > > > wouldn't it be wiser to check for the native method once and use it?
>
> > > Probably. I'd also check for innerText (in fact, I'd check for that
> > > first), since it's supported by IE, WebKit (so Chrome, Safari), and
> > > Opera; only Mozilla holds out. textContent is supported by all of them
> > > except IE. So:
>
> > > Element.addMethods((function() {
>
> > >     return {
> > >         /**
> > >          * Element.text() -> String
> > >          *
> > >          * Gets the text within the element, ignoring any tags
> > > (essentially the sum of all of the
> > >          * text nodes within).
> > >         **/
> > >         text: (function() {
> > >             var element, testvalue;
>
> > >             element = document.createElement("span");
> > >             element.innerHTML = testvalue = "foo";
> > >             if (text_fromInnerText(element) == testvalue) {
> > >                 return text_fromInnerText;
> > >             }
> > >             if (text_fromTextContent(element) == testvalue) {
> > >                 return text_fromTextContent;
> > >             }
> > >             return text_fromStripping;
> > >         })()
> > >     };
>
> > >     // Get the element's inner text via innerText if available (IE,
> > > WebKit, Opera, ...)
> > >     function text_fromInnerText(element) {
> > >         if (!(element = $(element))) return;
> > >         return element.innerText;
> > >     }
>
> > >     // Get the element's inner text via textContent if available
> > > (Gecko, WebKit, Opera, ...)
> > >     function text_fromTextContent(element) {
> > >         if (!(element = $(element))) return;
> > >         return element.textContent;
> > >     }
>
> > >     // Get the element's inner text by getting innerHTML and stripping
> > > tags (fallback)
> > >     function text_fromStripping(element) {
> > >         if (!(element = $(element))) return;
> > >         return element.innerHTML.stripTags();
> > >     }
>
> > > })());
>
> > > Do people think I should submit this to core? jQuery has an equivalent
> > > function, and I think I saw one in Closure as well. So it's not just
> > > the OP who wants to do this...
>
> > > -- T.J. :-)
>
> > > On Apr 13, 10:39 am, Eric <lefauv...@gmail.com> wrote:
>
> > > > Oooops, gmail sent the message before I finished... :o)
>
> > > > Here is the correct message (please ignore the previous one)
>
> > > > On Apr 12, 7:04 pm, "T.J. Crowder" <t...@crowdersoftware.com> wrote:
>
> > > > > Element.addMethods({
> > > > >     text: function(element) {
> > > > >         if (!(element = $(element))) return;
> > > > >         return element.innerHTML.stripTags();
> > > > >     }
> > > > > });
>
> > > > wouldn't it be wiser to check for the native method once and use it?
>
> > > > Something like (untested)
>
> > > > Element.addMethods({
> > > >     text: ($$('BODY').first().textContent===undefined)
> > > >             ? function(element) { if (!(element = $(element))) return;
> > > > return element.innerText; }
> > > >             : function(element) { if (!(element = $(element))) return;
> > > > return element.textContent; }
>
> > > > });
>
> > > > Eric
>
> > > > NB: I know, the testing condition is ugly... feel free to post a
> > > > better one :o)

-- 
You received this message because you are subscribed to the Google Groups 
"Prototype & script.aculo.us" group.
To post to this group, send email to prototype-scriptacul...@googlegroups.com.
To unsubscribe from this group, send email to 
prototype-scriptaculous+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/prototype-scriptaculous?hl=en.

[Proto-Scripty] Re: Cross-browser function for Text content

Reply via email to