On 16 Jun 2010, at 23:12, David Hyatt wrote:
On Jun 14, 2010, at 7:00 PM, Matt 'Murph' Finnicum wrote:
Why are there so many Text nodes in the DOM? I had a look at the initial DOM
tree from rendering slashdot, and there are 1959 Text nodes. Of those 1959,
1246 were whitespace-only nodes.
Does there need to be this many nodes? Why can't whitespace be combined with
the nodes next to it?
Whitespace nodes most commonly occur between elements, so they can't be
coalesced.
Hmm, this touches on a very interesting topic...
Strictly speaking, a basic parser, be it XML or HTML, should never, ever report
anything to downstream consumers that was not in the original source document.
The software is doing its job pretty accurately in that respect. All it needs
is a little help from the consumer/user/developer. --Think JS minifying. This
saves on bandwidth and even, IIRC, makes the compiler's job easier.
Basically, almost all of that whitespace serves only one purpose: to make the
source human-readable. All well while you're developing a website or webapp,
but come deployment, you will always fare better if the input stream is
guaranteed to be processor-friendly to begin with. Less ghosts to chase.
If the input is at least XHTML, and one is a tiny bit versed in XSLT, adding a
preprocessing whitespace stripper stylesheet could be a quick-fix solution to
reduce the waste of resources. That does consume resources elsewhere,
obviously, so you may want to check if it's really worth the effort.
xml:space, you would get for free if the processor is compliant in that
respect. For the remainder, basically a plain copy template for all nodes. The
exception being text nodes, for which you can use normalize-space() to see if
they contain anything other than XML whitespace, and thus need copying.
The limitation is that you do not have access to the resolved CSS, IIC. In
other words, if you have elements that can have #PCDATA content and that get a
class assigned that sets properties related to whitespace preservation, the
XSLT stylesheet will not see it (although there may exist extensions for CSS
parsing, not sure).
Then again, whitespace within, before or after text nodes is no problem, since
that is presumed significant by default (but that gets coalesced with the other
text later on, so no issue at all).
Part of it could, maybe, remotely, be implemented in WebKit itself.
If WebKit chooses, for example, to ignore character events from the parser in
nodes where logically it doesn't make sense to have stray characters (which,
incidentally, is the strategy Apache FOP uses, but that may be a slightly
different story since that is pure XML), it could mean a significant reduction
of the above 1246 nodes... perhaps even to 0?
Downsides? The live DOM no longer *exactly* reflects the input, so it would
definitely need to be configurable, just in case one does need that
functionality. OTOH, let's say that 95% of a site's visitors is not interested
at all in what the HTML source looks like. If you really want to share your
code with the other 5%, there are far better ways to do that than relying on
'View Source', no? For the remainder, I must admit I am having a hard time
imagining scenarios where ignorable whitespace would be desirable to keep
around. In the worst case, it could even needlessly complicate certain layout-
and rendering-related tasks...
Regards,
Andreas Delmelle
---
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev