innerText is one of those things IE got right, just like innerHTML. Let's 
please consider making that a standard instead of removing it. Also, please 
don't make the mistake of thinking it is the same thing as textContent. Think 
of textContent as pre-formatted text, and innerText as plain text. IE even 
correctly handles a span with display:block; and adds a line break.


Michael, good try, but I've been down that road; it's pretty hard to do. You 
left in the script text, spaces were missing, and there were no line breaks. 
You'd almost need an HTML parser. Take a list of tags like these: 
p span span em strong p script ul li span li span

You need to know where there are line breaks, or spaces, or neither. And that's 
without considering all the other block or HTML5 elements, or tables, etc. 
However, it's still not as easy as testing for whether the node is a block (or 
list-item, etc), because you then need to know if it is a block compared to the 
next and previous nodes; else a span in a p will get line breaks.

Mike Wilcox
http://clubajax.org
[email protected]



On Aug 15, 2010, at 7:41 AM, Michael A. Puls II wrote:

> On Sat, 14 Aug 2010 20:03:30 -0400, Mike Wilcox <[email protected]> wrote:
> 
>> Wow, I was just thinking of proposing this myself a few days ago.
>> 
>> In addition to Adam's comments, there is no standard, stable way of 
>> *getting* the text from a series of nodes. textContent returns everything, 
>> including tabs, white space, and even script content.
> 
> Well, you can do stuff like this:
> 
> ------
> (function() {
>    function trim(s) {
>        return s.replace(/^\s\s*/, '').replace(/\s\s*$/, '');
>    }
>    function setInnerText(v) {
>        this.textContent = v;
>    }
>    function getInnerText() {
>        var iter = this.ownerDocument.createNodeIterator(this,
>        NodeFilter.SHOW_TEXT, null, null);
>        var ret = "";
>        var first = true;
>        for (var node; (node = iter.nextNode()); ) {
>            var fixed = trim(node.nodeValue.replace(/\r|\n|\t/g, ""));
>            if (fixed.length > 0) {
>                if (!first) {
>                    ret += " ";
>                }
>                ret += fixed;
>                first = false;
>            }
>        }
>        return ret;
>    }
>    HTMLElement.prototype.__defineGetter__('myInnerText', getInnerText);
>    HTMLElement.prototype.__defineSetter__('myInnerText', setInnerText);
> })();
> ------
> 
> and adjust how you handle spaces and build the string etc. as you see fit. 
> Then, it's just alert(el.myInnerText).
> 
> NodeIterator's standard. __defineGetter/Setter__ is de-facto standard (and 
> you have Object.defineProperty as standard for those that support it). How 
> newlines and tabs and spaces are stripped/normalized just isn't standardized 
> in this case. But that might different depending on the application.
> 
> Or, just run a regex on textContent.
> 
> -- 
> Michael

Reply via email to