Re: [webkit-dev] Why so many text nodes in the DOM? (especially ones with just whitespace)
On 17 Jun 2010, at 20:37, Alexey Proskuryakov wrote: 17.06.2010, в 9:53, Andreas Delmelle написал(а): If WebKit chooses, for example, to ignore character events from the parser in nodes where logically it doesn't make sense to have stray characters That would break e.g. Web sites where JS accesses DOM in ways such as node.firstChild.nextSibling, or node.childNodes[3]. We've previously seen similar breakage happen after changing WebCore parsing code. Wow, good point! Suddenly I feel foolish, not having thought of that hyper-trivial scenario. Obviously a very good reason to keep those nodes in. Still, one wonders from time to time how much bandwidth is actually wasted by sending over all these extraneous bytes that ultimately compel JS developers to write code like the above. I don't think I have ever seen a website that does /not/ serve its HTML pretty-printed... That seems like an awful lot of spaces, tabs and linefeeds! On the other hand, node.firstChild.nextSibling just seems like asking for trouble. One could argue that people who do use that to get to the first element node do not need to be accommodated. It would suffice for one of the page's authors to insert a small comment node to break that code. One could just as easily extend Node with a firstElement() method that would work under all circumstances --but, oh yes, IE didn't support that back then... ;-) Regards, Andreas Delmelle --- ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Why so many text nodes in the DOM? (especially ones with just whitespace)
firstElementChild already exists in modern browsers: http://www.w3.org/TR/ElementTraversal/#interface-elementTraversal Anyway, this thread is done. On Fri, Jun 18, 2010 at 10:27 AM, Andreas Delmelle andreas.delme...@telenet.be wrote: On 17 Jun 2010, at 20:37, Alexey Proskuryakov wrote: 17.06.2010, в 9:53, Andreas Delmelle написал(а): If WebKit chooses, for example, to ignore character events from the parser in nodes where logically it doesn't make sense to have stray characters That would break e.g. Web sites where JS accesses DOM in ways such as node.firstChild.nextSibling, or node.childNodes[3]. We've previously seen similar breakage happen after changing WebCore parsing code. Wow, good point! Suddenly I feel foolish, not having thought of that hyper-trivial scenario. Obviously a very good reason to keep those nodes in. Still, one wonders from time to time how much bandwidth is actually wasted by sending over all these extraneous bytes that ultimately compel JS developers to write code like the above. I don't think I have ever seen a website that does /not/ serve its HTML pretty-printed... That seems like an awful lot of spaces, tabs and linefeeds! On the other hand, node.firstChild.nextSibling just seems like asking for trouble. One could argue that people who do use that to get to the first element node do not need to be accommodated. It would suffice for one of the page's authors to insert a small comment node to break that code. One could just as easily extend Node with a firstElement() method that would work under all circumstances --but, oh yes, IE didn't support that back then... ;-) Regards, Andreas Delmelle --- ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Why so many text nodes in the DOM? (especially ones with just whitespace)
On 16 Jun 2010, at 23:12, David Hyatt wrote: On Jun 14, 2010, at 7:00 PM, Matt 'Murph' Finnicum wrote: Why are there so many Text nodes in the DOM? I had a look at the initial DOM tree from rendering slashdot, and there are 1959 Text nodes. Of those 1959, 1246 were whitespace-only nodes. Does there need to be this many nodes? Why can't whitespace be combined with the nodes next to it? Whitespace nodes most commonly occur between elements, so they can't be coalesced. Hmm, this touches on a very interesting topic... Strictly speaking, a basic parser, be it XML or HTML, should never, ever report anything to downstream consumers that was not in the original source document. The software is doing its job pretty accurately in that respect. All it needs is a little help from the consumer/user/developer. --Think JS minifying. This saves on bandwidth and even, IIRC, makes the compiler's job easier. Basically, almost all of that whitespace serves only one purpose: to make the source human-readable. All well while you're developing a website or webapp, but come deployment, you will always fare better if the input stream is guaranteed to be processor-friendly to begin with. Less ghosts to chase. If the input is at least XHTML, and one is a tiny bit versed in XSLT, adding a preprocessing whitespace stripper stylesheet could be a quick-fix solution to reduce the waste of resources. That does consume resources elsewhere, obviously, so you may want to check if it's really worth the effort. xml:space, you would get for free if the processor is compliant in that respect. For the remainder, basically a plain copy template for all nodes. The exception being text nodes, for which you can use normalize-space() to see if they contain anything other than XML whitespace, and thus need copying. The limitation is that you do not have access to the resolved CSS, IIC. In other words, if you have elements that can have #PCDATA content and that get a class assigned that sets properties related to whitespace preservation, the XSLT stylesheet will not see it (although there may exist extensions for CSS parsing, not sure). Then again, whitespace within, before or after text nodes is no problem, since that is presumed significant by default (but that gets coalesced with the other text later on, so no issue at all). Part of it could, maybe, remotely, be implemented in WebKit itself. If WebKit chooses, for example, to ignore character events from the parser in nodes where logically it doesn't make sense to have stray characters (which, incidentally, is the strategy Apache FOP uses, but that may be a slightly different story since that is pure XML), it could mean a significant reduction of the above 1246 nodes... perhaps even to 0? Downsides? The live DOM no longer *exactly* reflects the input, so it would definitely need to be configurable, just in case one does need that functionality. OTOH, let's say that 95% of a site's visitors is not interested at all in what the HTML source looks like. If you really want to share your code with the other 5%, there are far better ways to do that than relying on 'View Source', no? For the remainder, I must admit I am having a hard time imagining scenarios where ignorable whitespace would be desirable to keep around. In the worst case, it could even needlessly complicate certain layout- and rendering-related tasks... Regards, Andreas Delmelle --- ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Why so many text nodes in the DOM? (especially ones with just whitespace)
On Jun 17, 2010, at 2:45 PM, Gustavo Sverzut Barbieri wrote: David, it's bit more than annoying, it's fragmenting memory for no good. In the long run on systems will small memory it does make a difference :-/ I'd like to see some option, maybe compile-time, to strip these useless whitespaces. As Alexey points out, this is a compatibility issue though. People write code assuming the whitespace nodes are there. If you remove them, you'll see Web site breakage. dave (hy...@apple.com) ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Why so many text nodes in the DOM? (especially ones with just whitespace)
On Thu, Jun 17, 2010 at 4:19 PM, David Hyatt hy...@apple.com wrote: On Jun 17, 2010, at 2:45 PM, Gustavo Sverzut Barbieri wrote: David, it's bit more than annoying, it's fragmenting memory for no good. In the long run on systems will small memory it does make a difference :-/ I'd like to see some option, maybe compile-time, to strip these useless whitespaces. As Alexey points out, this is a compatibility issue though. People write code assuming the whitespace nodes are there. If you remove them, you'll see Web site breakage. dave (hy...@apple.com) Do people write code assuming the content of the whitespace nodes? That seems very unlikely to me. If not, we could collapse them and be much more efficient about things (such as a simple flag in their parent node that represents their existence) --Murph ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Why so many text nodes in the DOM? (especially ones with just whitespace)
On Thu, Jun 17, 2010 at 6:24 PM, Matt 'Murph' Finnicum mattf...@gmail.com wrote: On Thu, Jun 17, 2010 at 4:19 PM, David Hyatt hy...@apple.com wrote: On Jun 17, 2010, at 2:45 PM, Gustavo Sverzut Barbieri wrote: David, it's bit more than annoying, it's fragmenting memory for no good. In the long run on systems will small memory it does make a difference :-/ I'd like to see some option, maybe compile-time, to strip these useless whitespaces. As Alexey points out, this is a compatibility issue though. People write code assuming the whitespace nodes are there. If you remove them, you'll see Web site breakage. dave (hy...@apple.com) Do people write code assuming the content of the whitespace nodes? That seems very unlikely to me. If not, we could collapse them and be much more efficient about things (such as a simple flag in their parent node that represents their existence) Same experience on my side, most JS libs that handle whitespace, do it just to ignore the nodes... running through strip and then checking for emptiness. -- Gustavo Sverzut Barbieri http://profusion.mobi embedded systems -- MSN: barbi...@gmail.com Skype: gsbarbieri Mobile: +55 (19) 9225-2202 ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Why so many text nodes in the DOM? (especially ones with just whitespace)
On Jun 17, 2010, at 4:07 PM, Eric Seidel wrote: A does not follow from B in that sentence, that current memory fragmentation means we need to strip whitespace nodes. Yeah exactly. Let's see some measurements that show the presence of these nodes are a real problem. It would also be possible to create a special shared \n text node, and have some sort of Copy On Write behavior. Again, more complexity. Not sure if the complexity would be worth the perf win. There might be interesting tricks around this idea yeah. Basically you want the common case of never really looking at the DOM from JS to result in efficient performance/memory use. If walking the DOM then results in the creation of a more heavyweight whitespace Node, that would be fine I think. The idea of trying to fold the information into Element is an interesting one. Again I think we'd need measurements to see how much of a gain we'd typically get if we did this just for newlines for example, or if we'd need an optimization that extended to arbitrary whitespace sequences. I tend to think the latter would be required (since cleanly indented HTML will have newlines and tabs and spaces for indentation in between elements). I'm not particularly interested in a break-the-web mode that strips out those Nodes completely by default, since (a) breaking compatibility is bad and (b) nobody has shown me any evidence that these extra nodes are a problem on mobile devices. An optimization that reduces memory use while preserving the correct behavior is much more appealing to me. dave (hy...@apple.com) ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Why so many text nodes in the DOM? (especially ones with just whitespace)
I have seen sites that make this assumption. Even worked on one. From a website writer's view, the whitespace nodes are usually a pain, but if you add some hacks to skip around them (as ugly as that is), you expect those hacks to keep working. trey On Jun 17, 2010, at 2:19 PM, David Hyatt wrote: On Jun 17, 2010, at 2:45 PM, Gustavo Sverzut Barbieri wrote: David, it's bit more than annoying, it's fragmenting memory for no good. In the long run on systems will small memory it does make a difference :-/ I'd like to see some option, maybe compile-time, to strip these useless whitespaces. As Alexey points out, this is a compatibility issue though. People write code assuming the whitespace nodes are there. If you remove them, you'll see Web site breakage. dave (hy...@apple.com) ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Why so many text nodes in the DOM? (especially ones with just whitespace)
I filed https://bugs.webkit.org/show_bug.cgi?id=40800 to track this issue. I think we can take further discussion to the bug. dave (hy...@apple.com) ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] Why so many text nodes in the DOM? (especially ones with just whitespace)
Whitespace nodes most commonly occur between elements, so they can't be coalesced. dave On Jun 14, 2010, at 7:00 PM, Matt 'Murph' Finnicum wrote: Why are there so many Text nodes in the DOM? I had a look at the initial DOM tree from rendering slashdot, and there are 1959 Text nodes. Of those 1959, 1246 were whitespace-only nodes. Does there need to be this many nodes? Why can't whitespace be combined with the nodes next to it? Thanks, --Murph ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev