Re: [webkit-dev] Discussing bug 98539 - Refactor resource loading to allow for out-of-process loading and memory caching

2012-10-09 Thread Adam Barth
On Mon, Oct 8, 2012 at 6:21 PM, Maciej Stachowiak m...@apple.com wrote:
 On Oct 8, 2012, at 5:28 PM, Adam Barth aba...@webkit.org wrote:
 On Mon, Oct 8, 2012 at 2:17 PM, Brady Eidson beid...@apple.com wrote:
 On Oct 8, 2012, at 12:17 PM, Adam Barth aba...@webkit.org wrote:

 Would there be any design or implementation constraints on WebCore?
 For example, would WebCore need to understand any concurrency or
 performance issues caused by the memory being shared between
 processes?

 For now we think the answer is no, or that any parts that do need to be 
 concerned to be wholly encapsulated within the support for the client model.

 Ok.  If there are no design implications for WebCore, then I don't
 have a problem with this work continuing.

 Based on my experience with this topic in Chromium, I believe you're
 over-engineering, but if you're unwilling to share your data, I doubt
 further discussion with cause either of us to change our minds.

 You can expect that we'll collect and share some data as the work progresses. 
 The fact is that we don't have really great data to share yet, we are still 
 in an exploratory phase. If you have any past data to share, we'd love to 
 look at it.

Unfortunately, I don't have the data from our previous experiments anymore.

 One preliminary finding of ours is that different web pages fairly often load 
 identical resource bodies from different URLs. We expect possible benefits 
 from sharing the body data of resources in memory even if we cannot share the 
 URL or response headers.

 You can also expect that we won't push forward blindly on this effort if data 
 ultimately shows it to be a bad idea, or in general not worth the complexity.

As I mentioned in bugs.webkit.org, we did our experiments a number of
years ago, and it's certainly possible that the web has changed (e.g.,
social widgets have gained a lot of popularity in the intervening
time).  Maybe we should run some experiments as well and reconsider
Chromium's approach.

Adam
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] New WebKit Reviewer: Caio Marcelo de Oliveira Filho

2012-10-09 Thread Antonio Gomes
Parabéns, Caio! :-)

On Mon, Oct 8, 2012 at 10:32 PM, Zoltan Horvath zol...@webkit.org wrote:


 Cool! Congratulations Caio!

 On Mon, Oct 8, 2012 at 2:20 PM, noam.rosent...@nokia.com wrote:

 I am pleased to announce that Caio (@cmarcelo) is now a WebKit reviewer.
 Caio has been working in many areas in the past years, starting with
 focus on the Qt port with improvements to the Qt/JS bridge, font test
 results and render-theme.
 More recently has also work on the UndoManager and other aspects of
 editing code and CSS parsing, his latest contributions being around WTF
 HashMap iterators.
 Since Caio has been involved in so many parts of WebKit, I'm probably
 forgetting a few other contributions...

 Please join me in congratulating Caio for his new reviewer status!
 No'am
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo/webkit-dev



 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo/webkit-dev




-- 
--Antonio Gomes
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Discussing bug 98539 - Refactor resource loading to allow for out-of-process loading and memory caching

2012-10-09 Thread Antti Koivisto
On Tue, Oct 9, 2012 at 4:21 AM, Maciej Stachowiak m...@apple.com wrote:

 One preliminary finding of ours is that different web pages fairly often
 load identical resource bodies from different URLs. We expect possible
 benefits from sharing the body data of resources in memory even if we
 cannot share the URL or response headers.


I did some preliminary profiling of this earlier. The data is not
necessarily very representative of the web as whole, I simply browsed
through a number of popular sites with an instrumented build (
https://bug-98539-attachments.webkit.org/attachment.cgi?id=167753). Here is
what I had in the WebCore memory cache in the end:

TOTAL duplicates count=443 size=852791 decodedSize=3426638
all cached resources count=2636 size=116463712

17% of cache entries are exact copies of previous entries. They use 4.3MB
out of total 116MB cache size (4%). The average size of duplicate entries
is much smaller than the average entry size (not surprisingly, transparent
1x1 images etc).

Some examples:

HTTP vs. HTTPS:

hash=5d90af size=91556 decodedSize=34834
https://ajax.googleapis.com/ajax/libs/jquery/1.6.2/jquery.min.js
hash=5d90af size=91556 decodedSize=32
http://ajax.googleapis.com/ajax/libs/jquery/1.6.2/jquery.min.js

(decodedSize for js resources is the size of the function offset cache
which can differ depending which functions have been parsed).

Ad scripts with deliberately unique URLs:

hash=6f6494 size=3111 decodedSize=32
http://ad.doubleclick.net/adj/teg.fmsq/j2ek;subs=n;wsub=n;sdn=n;dcopt=ist;pos=ldr_top;sz=728x90,970x90;tile=1;ord=76273718
?
hash=6f6494 size=3111 decodedSize=648
http://ad.doubleclick.net/adj/teg.fmsq/j2ek;subs=n;wsub=n;sdn=n;dcopt=ist;pos=ldr_top;sz=728x90,970x90;tile=1;ord=711567314
?
hash=6f6494 size=3111 decodedSize=32
http://ad.doubleclick.net/adj/teg.fmsq/j2ek;subs=n;wsub=n;sdn=n;dcopt=ist;pos=ldr_top;sz=728x90,970x90;tile=1;ord=908690779
?
hash=6f6494 size=3111 decodedSize=32
http://ad.doubleclick.net/adj/teg.fmsq/j2ek;subs=n;wsub=n;sdn=n;dcopt=ist;pos=ldr_top;sz=728x90,970x90;tile=1;ord=876087584
?

Same framework loaded by multiple unrelated sites:

hash=230c7d size=1958 decodedSize=0
http://static.iltalehti.fi/js/measure/spring.js
hash=230c7d size=1958 decodedSize=1536 http://yle.fi/global/spring/spring.js

(didn't see as many of these as expected though i'm sure there are sites
that run into this)

Copy-paste resources:

hash=8fb3e size=3965 decodedSize=13088
http://store.storeimages.cdn-apple.com/2149/store.apple.com/rs/source/store/base/nav/globalnav/css/bg/globalsearch_spinner.gif
hash=8fb3e size=3965 decodedSize=13088
http://store.storeimages.cdn-apple.com/2150/store.apple.com/rs/source/store/features/search/css/bg/spinner.gif
hash=8fb3e size=3965 decodedSize=13088
http://store.storeimages.cdn-apple.com/2150/store.apple.com/rs/source/store/base/nav/globalnav/css/bg/globalsearch_spinner.gif
hash=8fb3e size=3965 decodedSize=13088
http://images.apple.com/global/nav/images/globalsearch_spinner.gif

Versioning:

hash=cecbad size=184899 decodedSize=0
http://store.storeimages.cdn-apple.com/2150/store.apple.com/rs/applestore-rs-2.css
hash=cecbad size=184899 decodedSize=0
http://store.storeimages.cdn-apple.com/2149/store.apple.com/rs/applestore-rs-2.css

hash=47a8de size=20041 decodedSize=7870
http://js.t.sinajs.cn/open/analytics/js/suda.js?version=b4d67909ad6b5b7d
hash=47a8de size=20041 decodedSize=7870
http://tjs.sjs.sinajs.cn/open/analytics/js/suda.js

Multiple content servers:

hash=8e976b size=808 decodedSize=13088
http://i0.sinaimg.cn/home/deco/2008/0329/sinahome_0803_ws_003_new.gif
hash=8e976b size=808 decodedSize=71700
http://i3.sinaimg.cn/home/deco/2008/0329/sinahome_0803_ws_003_new.gif

Based on this I think this is definitely worth at least looking further.


  antti

You can also expect that we won't push forward blindly on this effort if
 data ultimately shows it to be a bad idea, or in general not worth the
 complexity.

 Regards,
 Maciej

 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Discussing bug 98539 - Refactor resource loading to allow for out-of-process loading and memory caching

2012-10-09 Thread Adam Barth
On Tue, Oct 9, 2012 at 7:52 AM, Antti Koivisto koivi...@iki.fi wrote:
 On Tue, Oct 9, 2012 at 4:21 AM, Maciej Stachowiak m...@apple.com wrote:
 One preliminary finding of ours is that different web pages fairly often
 load identical resource bodies from different URLs. We expect possible
 benefits from sharing the body data of resources in memory even if we cannot
 share the URL or response headers.

 I did some preliminary profiling of this earlier. The data is not
 necessarily very representative of the web as whole, I simply browsed
 through a number of popular sites with an instrumented build
 (https://bug-98539-attachments.webkit.org/attachment.cgi?id=167753). Here is
 what I had in the WebCore memory cache in the end:

 TOTAL duplicates count=443 size=852791 decodedSize=3426638
 all cached resources count=2636 size=116463712

 17% of cache entries are exact copies of previous entries. They use 4.3MB
 out of total 116MB cache size (4%). The average size of duplicate entries is
 much smaller than the average entry size (not surprisingly, transparent 1x1
 images etc).

 Some examples:

 HTTP vs. HTTPS:

 hash=5d90af size=91556 decodedSize=34834
 https://ajax.googleapis.com/ajax/libs/jquery/1.6.2/jquery.min.js
 hash=5d90af size=91556 decodedSize=32
 http://ajax.googleapis.com/ajax/libs/jquery/1.6.2/jquery.min.js

 (decodedSize for js resources is the size of the function offset cache which
 can differ depending which functions have been parsed).

 Ad scripts with deliberately unique URLs:

 hash=6f6494 size=3111 decodedSize=32
 http://ad.doubleclick.net/adj/teg.fmsq/j2ek;subs=n;wsub=n;sdn=n;dcopt=ist;pos=ldr_top;sz=728x90,970x90;tile=1;ord=76273718?
 hash=6f6494 size=3111 decodedSize=648
 http://ad.doubleclick.net/adj/teg.fmsq/j2ek;subs=n;wsub=n;sdn=n;dcopt=ist;pos=ldr_top;sz=728x90,970x90;tile=1;ord=711567314?
 hash=6f6494 size=3111 decodedSize=32
 http://ad.doubleclick.net/adj/teg.fmsq/j2ek;subs=n;wsub=n;sdn=n;dcopt=ist;pos=ldr_top;sz=728x90,970x90;tile=1;ord=908690779?
 hash=6f6494 size=3111 decodedSize=32
 http://ad.doubleclick.net/adj/teg.fmsq/j2ek;subs=n;wsub=n;sdn=n;dcopt=ist;pos=ldr_top;sz=728x90,970x90;tile=1;ord=876087584?

 Same framework loaded by multiple unrelated sites:

 hash=230c7d size=1958 decodedSize=0
 http://static.iltalehti.fi/js/measure/spring.js
 hash=230c7d size=1958 decodedSize=1536 http://yle.fi/global/spring/spring.js

 (didn't see as many of these as expected though i'm sure there are sites
 that run into this)

 Copy-paste resources:

 hash=8fb3e size=3965 decodedSize=13088
 http://store.storeimages.cdn-apple.com/2149/store.apple.com/rs/source/store/base/nav/globalnav/css/bg/globalsearch_spinner.gif
 hash=8fb3e size=3965 decodedSize=13088
 http://store.storeimages.cdn-apple.com/2150/store.apple.com/rs/source/store/features/search/css/bg/spinner.gif
 hash=8fb3e size=3965 decodedSize=13088
 http://store.storeimages.cdn-apple.com/2150/store.apple.com/rs/source/store/base/nav/globalnav/css/bg/globalsearch_spinner.gif
 hash=8fb3e size=3965 decodedSize=13088
 http://images.apple.com/global/nav/images/globalsearch_spinner.gif

 Versioning:

 hash=cecbad size=184899 decodedSize=0
 http://store.storeimages.cdn-apple.com/2150/store.apple.com/rs/applestore-rs-2.css
 hash=cecbad size=184899 decodedSize=0
 http://store.storeimages.cdn-apple.com/2149/store.apple.com/rs/applestore-rs-2.css

 hash=47a8de size=20041 decodedSize=7870
 http://js.t.sinajs.cn/open/analytics/js/suda.js?version=b4d67909ad6b5b7d
 hash=47a8de size=20041 decodedSize=7870
 http://tjs.sjs.sinajs.cn/open/analytics/js/suda.js

 Multiple content servers:

 hash=8e976b size=808 decodedSize=13088
 http://i0.sinaimg.cn/home/deco/2008/0329/sinahome_0803_ws_003_new.gif
 hash=8e976b size=808 decodedSize=71700
 http://i3.sinaimg.cn/home/deco/2008/0329/sinahome_0803_ws_003_new.gif

 Based on this I think this is definitely worth at least looking further.

This is interesting data, but it seems to be related to whether we
should make the MemoryCache content addressable rather than whether we
should use shared memory to back the MemoryCache when there are
multiple WebProcesses.

Adam


 You can also expect that we won't push forward blindly on this effort if
 data ultimately shows it to be a bad idea, or in general not worth the
 complexity.

 Regards,
 Maciej

 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo/webkit-dev


___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Discussing bug 98539 - Refactor resource loading to allow for out-of-process loading and memory caching

2012-10-09 Thread Antti Koivisto
On Tue, Oct 9, 2012 at 10:02 PM, Adam Barth aba...@webkit.org wrote:

 This is interesting data, but it seems to be related to whether we
 should make the MemoryCache content addressable rather than whether we
 should use shared memory to back the MemoryCache when there are
 multiple WebProcesses.


It is relevant when considering if and how to share cache data between
processes. It is also interesting in single process case. Brady's
refactoring should be helpful for both scenarios.


  antti


 Adam


  You can also expect that we won't push forward blindly on this effort if
  data ultimately shows it to be a bad idea, or in general not worth the
  complexity.
 
  Regards,
  Maciej
 
  ___
  webkit-dev mailing list
  webkit-dev@lists.webkit.org
  http://lists.webkit.org/mailman/listinfo/webkit-dev
 
 

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Discussing bug 98539 - Refactor resource loading to allow for out-of-process loading and memory caching

2012-10-09 Thread Adam Barth
On Tue, Oct 9, 2012 at 12:17 PM, Antti Koivisto koivi...@iki.fi wrote:
 On Tue, Oct 9, 2012 at 10:02 PM, Adam Barth aba...@webkit.org wrote:
 This is interesting data, but it seems to be related to whether we
 should make the MemoryCache content addressable rather than whether we
 should use shared memory to back the MemoryCache when there are
 multiple WebProcesses.

 It is relevant when considering if and how to share cache data between
 processes. It is also interesting in single process case. Brady's
 refactoring should be helpful for both scenarios.

Content-addressable caches are quite interesting.  There are a couple
benefits you could hope to achieve:

1) Reduced memory usage by deduping cached values.  The data you
mentioned seems mostly about this benefit.

2) Reduced latency by finding increasing the cache hit rate for
duplicated entries.  This one is trickier without cooperation from the
server because you don't know the hash of the resource until you've
already received it.

We've had a couple of customers ask about (2), but there are some
tricky security problems because you end up leaking the identity of
cross-origin resources in the timing channel.  Aiming for (1) also
carries some of that risk because you'll leak the identity of
cross-origin resources in the cache eviction channel (which can be
probed with timing or network traffic), but it's likely not as big a
problem.

Adam
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Discussing bug 98539 - Refactor resource loading to allow for out-of-process loading and memory caching

2012-10-09 Thread Maciej Stachowiak

On Oct 9, 2012, at 1:24 PM, Adam Barth aba...@webkit.org wrote:

 On Tue, Oct 9, 2012 at 12:17 PM, Antti Koivisto koivi...@iki.fi wrote:
 On Tue, Oct 9, 2012 at 10:02 PM, Adam Barth aba...@webkit.org wrote:
 This is interesting data, but it seems to be related to whether we
 should make the MemoryCache content addressable rather than whether we
 should use shared memory to back the MemoryCache when there are
 multiple WebProcesses.
 
 It is relevant when considering if and how to share cache data between
 processes. It is also interesting in single process case. Brady's
 refactoring should be helpful for both scenarios.
 
 Content-addressable caches are quite interesting.  There are a couple
 benefits you could hope to achieve:
 
 1) Reduced memory usage by deduping cached values.  The data you
 mentioned seems mostly about this benefit.
 
 2) Reduced latency by finding increasing the cache hit rate for
 duplicated entries.  This one is trickier without cooperation from the
 server because you don't know the hash of the resource until you've
 already received it.
 
 We've had a couple of customers ask about (2), but there are some
 tricky security problems because you end up leaking the identity of
 cross-origin resources in the timing channel.  Aiming for (1) also
 carries some of that risk because you'll leak the identity of
 cross-origin resources in the cache eviction channel (which can be
 probed with timing or network traffic), but it's likely not as big a
 problem.

We're mainly interested in (1), with the corollary of greater cache 
effectiveness at equivalent total cache size (so you can think of the benefit 
as an indirect speed win rather than as just a memory win).

Regards,
Maciej

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Discussing bug 98539 - Refactor resource loading to allow for out-of-process loading and memory caching

2012-10-09 Thread Adam Barth
On Tue, Oct 9, 2012 at 1:31 PM, Maciej Stachowiak m...@apple.com wrote:
 On Oct 9, 2012, at 1:24 PM, Adam Barth aba...@webkit.org wrote:
 On Tue, Oct 9, 2012 at 12:17 PM, Antti Koivisto koivi...@iki.fi wrote:
 On Tue, Oct 9, 2012 at 10:02 PM, Adam Barth aba...@webkit.org wrote:
 This is interesting data, but it seems to be related to whether we
 should make the MemoryCache content addressable rather than whether we
 should use shared memory to back the MemoryCache when there are
 multiple WebProcesses.

 It is relevant when considering if and how to share cache data between
 processes. It is also interesting in single process case. Brady's
 refactoring should be helpful for both scenarios.

 Content-addressable caches are quite interesting.  There are a couple
 benefits you could hope to achieve:

 1) Reduced memory usage by deduping cached values.  The data you
 mentioned seems mostly about this benefit.

 2) Reduced latency by finding increasing the cache hit rate for
 duplicated entries.  This one is trickier without cooperation from the
 server because you don't know the hash of the resource until you've
 already received it.

 We've had a couple of customers ask about (2), but there are some
 tricky security problems because you end up leaking the identity of
 cross-origin resources in the timing channel.  Aiming for (1) also
 carries some of that risk because you'll leak the identity of
 cross-origin resources in the cache eviction channel (which can be
 probed with timing or network traffic), but it's likely not as big a
 problem.

 We're mainly interested in (1), with the corollary of greater cache 
 effectiveness at equivalent total cache size (so you can think of the benefit 
 as an indirect speed win rather than as just a memory win).

That raises the question of what the cache-size to hit-rate curve
looks like.  I don't think that's something we've ever measured for
the MemoryCache, but it would be interesting to know, for example,
whether increasing the cache size by 4% increases the cache hit rate
by more or less than 4%.

Adam
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev