Re: [webkit-dev] Why so many text nodes in the DOM? (especially ones with just whitespace)

2010-06-18 Thread Andreas Delmelle

On 17 Jun 2010, at 20:37, Alexey Proskuryakov wrote:

 
 17.06.2010, в 9:53, Andreas Delmelle написал(а):
 
 If WebKit chooses, for example, to ignore character events from the parser 
 in nodes where logically it doesn't make sense to have stray characters
 
 
 That would break e.g. Web sites where JS accesses DOM in ways such as 
 node.firstChild.nextSibling, or node.childNodes[3]. We've previously seen 
 similar breakage happen after changing WebCore parsing code.

Wow, good point! Suddenly I feel foolish, not having thought of that 
hyper-trivial scenario. Obviously a very good reason to keep those nodes in. 

Still, one wonders from time to time how much bandwidth is actually wasted by 
sending over all these extraneous bytes that ultimately compel JS developers to 
write code like the above. I don't think I have ever seen a website that does 
/not/ serve its HTML pretty-printed... That seems like an awful lot of spaces, 
tabs and linefeeds!

On the other hand, node.firstChild.nextSibling just seems like asking for 
trouble. One could argue that people who do use that to get to the first 
element node do not need to be accommodated. It would suffice for one of the 
page's authors to insert a small comment node to break that code.

One could just as easily extend Node with a firstElement() method that would 
work under all circumstances --but, oh yes, IE didn't support that back then... 
;-)


Regards,

Andreas Delmelle
---

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Why so many text nodes in the DOM? (especially ones with just whitespace)

2010-06-18 Thread Eric Seidel
firstElementChild already exists in modern browsers:
http://www.w3.org/TR/ElementTraversal/#interface-elementTraversal

Anyway, this thread is done.

On Fri, Jun 18, 2010 at 10:27 AM, Andreas Delmelle
andreas.delme...@telenet.be wrote:

 On 17 Jun 2010, at 20:37, Alexey Proskuryakov wrote:


 17.06.2010, в 9:53, Andreas Delmelle написал(а):

 If WebKit chooses, for example, to ignore character events from the parser 
 in nodes where logically it doesn't make sense to have stray characters


 That would break e.g. Web sites where JS accesses DOM in ways such as 
 node.firstChild.nextSibling, or node.childNodes[3]. We've previously seen 
 similar breakage happen after changing WebCore parsing code.

 Wow, good point! Suddenly I feel foolish, not having thought of that 
 hyper-trivial scenario. Obviously a very good reason to keep those nodes in.

 Still, one wonders from time to time how much bandwidth is actually wasted by 
 sending over all these extraneous bytes that ultimately compel JS developers 
 to write code like the above. I don't think I have ever seen a website that 
 does /not/ serve its HTML pretty-printed... That seems like an awful lot of 
 spaces, tabs and linefeeds!

 On the other hand, node.firstChild.nextSibling just seems like asking for 
 trouble. One could argue that people who do use that to get to the first 
 element node do not need to be accommodated. It would suffice for one of the 
 page's authors to insert a small comment node to break that code.

 One could just as easily extend Node with a firstElement() method that would 
 work under all circumstances --but, oh yes, IE didn't support that back 
 then... ;-)


 Regards,

 Andreas Delmelle
 ---

 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Why so many text nodes in the DOM? (especially ones with just whitespace)

2010-06-17 Thread Andreas Delmelle
On 16 Jun 2010, at 23:12, David Hyatt wrote:

 On Jun 14, 2010, at 7:00 PM, Matt 'Murph' Finnicum wrote:
 
 Why are there so many Text nodes in the DOM? I had a look at the initial DOM 
 tree from rendering slashdot, and there are 1959 Text nodes. Of those 1959, 
 1246 were whitespace-only nodes.
 
 Does there need to be this many nodes? Why can't whitespace be combined with 
 the nodes next to it?

 Whitespace nodes most commonly occur between elements, so they can't be 
 coalesced.

Hmm, this touches on a very interesting topic...

Strictly speaking, a basic parser, be it XML or HTML, should never, ever report 
anything to downstream consumers that was not in the original source document. 
The software is doing its job pretty accurately in that respect. All it needs 
is a little help from the consumer/user/developer. --Think JS minifying. This 
saves on bandwidth and even, IIRC, makes the compiler's job easier.
Basically, almost all of that whitespace serves only one purpose: to make the 
source human-readable. All well while you're developing a website or webapp, 
but come deployment, you will always fare better if the input stream is 
guaranteed to be processor-friendly to begin with. Less ghosts to chase.

If the input is at least XHTML, and one is a tiny bit versed in XSLT, adding a 
preprocessing whitespace stripper stylesheet could be a quick-fix solution to 
reduce the waste of resources. That does consume resources elsewhere, 
obviously, so you may want to check if it's really worth the effort.
xml:space, you would get for free if the processor is compliant in that 
respect. For the remainder, basically a plain copy template for all nodes. The 
exception being text nodes, for which you can use normalize-space() to see if 
they contain anything other than XML whitespace, and thus need copying.
The limitation is that you do not have access to the resolved CSS, IIC. In 
other words, if you have elements that can have #PCDATA content and that get a 
class assigned that sets properties related to whitespace preservation, the 
XSLT stylesheet will not see it (although there may exist extensions for CSS 
parsing, not sure). 
Then again, whitespace within, before or after text nodes is no problem, since 
that is presumed significant by default (but that gets coalesced with the other 
text later on, so no issue at all).

Part of it could, maybe, remotely, be implemented in WebKit itself.
If WebKit chooses, for example, to ignore character events from the parser in 
nodes where logically it doesn't make sense to have stray characters (which, 
incidentally, is the strategy Apache FOP uses, but that may be a slightly 
different story since that is pure XML), it could mean a significant reduction 
of the above 1246 nodes... perhaps even to 0? 

Downsides? The live DOM no longer *exactly* reflects the input, so it would 
definitely need to be configurable, just in case one does need that 
functionality. OTOH, let's say that 95% of a site's visitors is not interested 
at all in what the HTML source looks like. If you really want to share your 
code with the other 5%, there are far better ways to do that than relying on 
'View Source', no? For the remainder, I must admit I am having a hard time 
imagining scenarios where ignorable whitespace would be desirable to keep 
around. In the worst case, it could even needlessly complicate certain layout- 
and rendering-related tasks...



Regards,

Andreas Delmelle
---

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Why so many text nodes in the DOM? (especially ones with just whitespace)

2010-06-17 Thread David Hyatt
On Jun 17, 2010, at 2:45 PM, Gustavo Sverzut Barbieri wrote:

 David, it's bit more than annoying, it's fragmenting memory for no
 good. In the long run on systems will small memory it does make a
 difference :-/
 
 I'd like to see some option, maybe compile-time, to strip these
 useless whitespaces.
 

As Alexey points out, this is a compatibility issue though.  People write code 
assuming the whitespace nodes are there.  If you remove them, you'll see Web 
site breakage.

dave
(hy...@apple.com)

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Why so many text nodes in the DOM? (especially ones with just whitespace)

2010-06-17 Thread Matt 'Murph' Finnicum
On Thu, Jun 17, 2010 at 4:19 PM, David Hyatt hy...@apple.com wrote:

 On Jun 17, 2010, at 2:45 PM, Gustavo Sverzut Barbieri wrote:

  David, it's bit more than annoying, it's fragmenting memory for no
  good. In the long run on systems will small memory it does make a
  difference :-/
 
  I'd like to see some option, maybe compile-time, to strip these
  useless whitespaces.
 

 As Alexey points out, this is a compatibility issue though.  People write
 code assuming the whitespace nodes are there.  If you remove them, you'll
 see Web site breakage.

 dave
 (hy...@apple.com)


Do people write code assuming the content of the whitespace nodes? That
seems very unlikely to me. If not, we could collapse them and be much more
efficient about things (such as a simple flag in their parent node that
represents their existence)

--Murph
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Why so many text nodes in the DOM? (especially ones with just whitespace)

2010-06-17 Thread Gustavo Sverzut Barbieri
On Thu, Jun 17, 2010 at 6:24 PM, Matt 'Murph' Finnicum
mattf...@gmail.com wrote:
 On Thu, Jun 17, 2010 at 4:19 PM, David Hyatt hy...@apple.com wrote:

 On Jun 17, 2010, at 2:45 PM, Gustavo Sverzut Barbieri wrote:

  David, it's bit more than annoying, it's fragmenting memory for no
  good. In the long run on systems will small memory it does make a
  difference :-/
 
  I'd like to see some option, maybe compile-time, to strip these
  useless whitespaces.
 

 As Alexey points out, this is a compatibility issue though.  People write
 code assuming the whitespace nodes are there.  If you remove them, you'll
 see Web site breakage.

 dave
 (hy...@apple.com)


 Do people write code assuming the content of the whitespace nodes? That
 seems very unlikely to me. If not, we could collapse them and be much more
 efficient about things (such as a simple flag in their parent node that
 represents their existence)

Same experience on my side, most JS libs that handle whitespace, do it
just to ignore the nodes... running through strip and then checking
for emptiness.


-- 
Gustavo Sverzut Barbieri
http://profusion.mobi embedded systems
--
MSN: barbi...@gmail.com
Skype: gsbarbieri
Mobile: +55 (19) 9225-2202
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Why so many text nodes in the DOM? (especially ones with just whitespace)

2010-06-17 Thread David Hyatt
On Jun 17, 2010, at 4:07 PM, Eric Seidel wrote:

 A does not follow from B in that sentence, that current memory
 fragmentation means we need to strip whitespace nodes.
 

Yeah exactly.  Let's see some measurements that show the presence of these 
nodes are a real problem.

 It would also be possible to create a special shared \n text node,
 and have some sort of Copy On Write behavior.  Again, more complexity.
 Not sure if the complexity would be worth the perf win.
 

There might be interesting tricks around this idea yeah.  Basically you want 
the common case of never really looking at the DOM from JS to result in 
efficient performance/memory use.  If walking the DOM then results in the 
creation of a more heavyweight whitespace Node, that would be fine I think.

The idea of trying to fold the information into Element is an interesting one.  
Again I think we'd need measurements to see how much of a gain we'd typically 
get if we did this just for newlines for example, or if we'd need an 
optimization that extended to arbitrary whitespace sequences.  I tend to think 
the latter would be required (since cleanly indented HTML will have newlines 
and tabs and spaces for indentation in between elements).

I'm not particularly interested in a break-the-web mode that strips out those 
Nodes completely by default, since (a) breaking compatibility is bad and (b) 
nobody has shown me any evidence that these extra nodes are a problem on mobile 
devices.

An optimization that reduces memory use while preserving the correct behavior 
is much more appealing to me.

dave
(hy...@apple.com)

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Why so many text nodes in the DOM? (especially ones with just whitespace)

2010-06-17 Thread Trey Matteson
I have seen sites that make this assumption.  Even worked on one.  From a 
website writer's view, the whitespace nodes are usually a pain, but if you add 
some hacks to skip around them (as ugly as that is), you expect those hacks to 
keep working.

trey



On Jun 17, 2010, at 2:19 PM, David Hyatt wrote:

 On Jun 17, 2010, at 2:45 PM, Gustavo Sverzut Barbieri wrote:
 
 David, it's bit more than annoying, it's fragmenting memory for no
 good. In the long run on systems will small memory it does make a
 difference :-/
 
 I'd like to see some option, maybe compile-time, to strip these
 useless whitespaces.
 
 
 As Alexey points out, this is a compatibility issue though.  People write 
 code assuming the whitespace nodes are there.  If you remove them, you'll see 
 Web site breakage.
 
 dave
 (hy...@apple.com)
 
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Why so many text nodes in the DOM? (especially ones with just whitespace)

2010-06-17 Thread David Hyatt
I filed https://bugs.webkit.org/show_bug.cgi?id=40800 to track  this issue.  I 
think we can take further discussion to the bug.

dave
(hy...@apple.com)

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Why so many text nodes in the DOM? (especially ones with just whitespace)

2010-06-16 Thread David Hyatt
Whitespace nodes most commonly occur between elements, so they can't be 
coalesced.

dave

On Jun 14, 2010, at 7:00 PM, Matt 'Murph' Finnicum wrote:

 Why are there so many Text nodes in the DOM? I had a look at the initial DOM 
 tree from rendering slashdot, and there are 1959 Text nodes. Of those 1959, 
 1246 were whitespace-only nodes.
 
 Does there need to be this many nodes? Why can't whitespace be combined with 
 the nodes next to it?
 
 Thanks,
 --Murph
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev