Re: An analysis of content process memory overhead

2016-04-14 Thread Nicholas Nethercote
On Mon, Mar 21, 2016 at 3:50 PM, Nicholas Nethercote
 wrote:
>
> - Heap overhead is significant. Reducing the page-cache size could save a
>   couple of MiBs. Improvements beyond that are hard. Turning on jemalloc4
>   *might* help a bit, but I wouldn't bank on it, and there are other
>   complications with that.

I reduced the page-cache size from 4 MiB to 1 MiB in bug 1258257,
saving up to 3 MiB per process. There was no discernible performance
impact.

> On Linux64, libxul contains about 5.3 MiB of static data.

I've done some work to reduce this, as has Nathan Froyd, mostly under
bug 1254777.

I just did local Linux64 builds of the release branch and
mozilla-inbound. The 'data' measurement provided by the |size| utility
has dropped from 5,515,676 to 4,683,616 bytes, a reduction of 832,060
bytes.

I also double-checked this by enabling memory.system_memory_reporter
(which provides detailed OS-level memory measurements, Linux-only) and
then looking at the appropriate "libxul.so/[rw-p]" entry in
about:memory. The change there was from 5,472,256 to 4,685,824 bytes,
a reduction of 786,432 bytes. I'm not sure why these numbers don't
quite match the |size| numbers -- difference between on-disk an memory
representations, perhaps? Nonetheless, they're similar.


So that's the good news. The bad news is (a) there's still a long way
to go before we can reasonably ship with more than 2 or perhaps 4
content processes enabled, (b) those changes above represent the
lowest-hanging fruit I could find, and (c) I will have very limited
time to work further on this in the medium-term.

Nick
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: An analysis of content process memory overhead

2016-03-20 Thread Nicholas Nethercote
On Tue, Mar 15, 2016 at 2:34 PM, Nicholas Nethercote 
wrote:
>
>
-
> Conclusion
>
-
>
> The overhead per content process is significant. I can see scope for
moderate
> improvements, but I'm having trouble seeing how big improvements can be
made.
> Without big improvements, scaling the number of content processes beyond
> 4 (*maybe* 8) won't be possible.
>
> - JS overhead is the biggest factor. We execute a lot of JS code just
starting
>   up for each content process -- can that be reduced? We should also
consider a
>   smaller nursery size limit for content processes.
>
> - Heap overhead is significant. Reducing the page-cache size could save a
>   couple of MiBs. Improvements beyond that are hard. Turning on jemalloc4
>   *might* help a bit, but I wouldn't bank on it, and there are other
>   complications with that.
>
> - Static data is a big chunk. It's hard to make much of a dent there
because
>   it has a *very* long tail.
>
> - The remaining buckets are a lot smaller.

Just to expand upon that, here are the top-level numbers for all three
platforms, both small and large processes. For this computation I assumed
that
"explicit" memory is entirely a subset of "resident-unique", which is
probably
true or very close to it. (Note: this data looks best with a fixed-width
font.)

Linux64, small processes
- resident-unique   38.1 MiB (100%)
  - explicit- 22.8 MiB (60%)
- js-non-window   - 11.2 MiB (29%)
- other   -  7.8 MiB (20%)
- heap-overhead   -  3.8 MiB (10%)
  - static? - 15.3 MiB (40%)

Linux64, large processes
- resident-unique   52.6 MiB (100%)
  - explicit- 38.7 MiB (74%)
- js-non-window   - 22.3 MiB (42%)
- other   -  9.8 MiB (19%)
- heap-overhead   -  6.6 MiB (13%)
  - static? - 13.9 MiB (26%)

Mac64, small processes
- resident-unique   49.3 MiB (100%)
  - static? - 27.9 MiB (57%)
  - explicit- 21.4 MiB (43%)
- js-non-windows  - 11.1 MiB (23%)
- other   -  6.9 MiB (14%)
- heap-overhead   -  3.4 MiB ( 7%)

Mac64, large processes
- resident-unique   59.4 MiB (100%)
  - explicit- 30.1 MiB (51%)
- js-non-windows  - 15.7 MiB (26%)
- heap-overhead   -  7.7 MiB (13%)
- other   -  6.7 MiB (11%)
  - static? - 29.3 MiB (49%)

Win32, small processes
- resident-unique   39.3 MiB (100%)
  - static? - 23.4 MiB (60%)
  - explicit- 15.9 MiB (40%)
- js-non-windows  -  8.4 MiB (21%)
- heap-overhead   -  3.8 MiB (10%)
- other   -  3.7 MiB ( 9%)

Win32, large processes
- resident-unique   51.6 MiB (100%)
  - explicit- 28.5 MiB (55%)
- js-non-windows  - 16.1 MiB (31%)
- heap-overhead   -  6.8 MiB (13%)
- other   -  5.6 MiB (11%)
  - static? - 23.1 MiB (45%)

The "resident-unique" increases by 38--59 MiB per content process. That's a
bit
lower than erahm got in his measurements, possibly because his methodology
involved doing a lot more work in each content process.

Of that increase:
- "static?" accounts for 26--60%
- "explicit/js-non-windows" accounts for 21--42%
- "explicit/heap-overhead" accounts for 7--13%
- "explicit/other" (everything not accounted for by the above three lines)
  accounts for 9--20%

About the "static?" measure -- On Linux64, libxul contains about 5.3 MiB of
static data. Other libraries used by Firefox contain much less. So I don't
know
what else is being measured in the "static?" number (i.e. what accounts for
the
change in difference between "resident-unique" and "explicit").

Nick
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: An analysis of content process memory overhead

2016-03-20 Thread Till Schneidereit
I filed bug 876173[1] about this a long time ago. Recently, I talked to
Gabor, who's started looking into enabling multiple content processes.

One other thing we should be able to do is sharing the self-hosting
compartment as we do between runtimes within a process. It's not that big,
but it's not nothing, either.

till


[1] https://bugzilla.mozilla.org/show_bug.cgi?id=876173

On Tue, Mar 15, 2016 at 4:34 AM, Nicholas Nethercote  wrote:

> Greetings,
>
> erahm recently wrote a nice blog post with measurements showing the
> overhead of
> enabling multiple content processes:
>
> http://www.erahm.org/2016/02/11/memory-usage-of-firefox-with-e10s-enabled/
>
> The overhead is high -- 8 content processes *doubles* our physical memory
> usage -- which limits the possibility of increasing the number of content
> processes beyond a small number. Now I've done some follow-up
> measurements to find out what is causing the per-content-process overhead.
>
> I did this by measuring memory usage with four trivial web pages open,
> first
> with a single content process, then with four content processes, and then
> getting the diff between content processes of the two. (about:memory's diff
> algorithm normalizes PIDs in memory reports as "NNN" so multiple content
> processes naturally get collapsed together, which in this case is exactly
> what
> we want.) I call this the "small processes" measurement.
>
> If we divide the memory usage increase by 3 (the increase in the number of
> content processes) we get a rough measure of the minimum per-content
> process
> overhead.
>
> I then did a similar thing but with four more complex web pages (gmail,
> Google
> Docs, TreeHerder, Bugzilla). I call this the "large processes" measurement.
>
>
[ lots of analysis omitted to not get caught in the 40kb+ moderation queue ]

-
> Conclusion
>
> -
>
> The overhead per content process is significant. I can see scope for
> moderate
> improvements, but I'm having trouble seeing how big improvements can be
> made.
> Without big improvements, scaling the number of content processes beyond
> 4 (*maybe* 8) won't be possible.
>
> - JS overhead is the biggest factor. We execute a lot of JS code just
> starting
>   up for each content process -- can that be reduced? We should also
> consider a
>   smaller nursery size limit for content processes.
>
> - Heap overhead is significant. Reducing the page-cache size could save a
>   couple of MiBs. Improvements beyond that are hard. Turning on jemalloc4
>   *might* help a bit, but I wouldn't bank on it, and there are other
>   complications with that.
>
> - Static data is a big chunk. It's hard to make much of a dent there
> because
>   it has a *very* long tail.
>
> - The remaining buckets are a lot smaller.
>
> I'm happy to gives copies of the raw data files to anyone who wants to look
> at
> them in more detail.
>
> Nick
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: An analysis of content process memory overhead

2016-03-19 Thread Nicolas B. Pierron

On 03/17/2016 08:05 AM, Thinker Li wrote:

On Wednesday, March 16, 2016 at 10:22:40 PM UTC+8, Nicholas Nethercote wrote:

   Even if we can fix that, it's just a lot of JS code. We can lazily import
   JSMs; I wonder if we are failing to do that as much as we could, i.e. are
   all these modules really needed at start-up? It would be great if we
   could instrument module-loading code in some way that answers this
question.


B2G also did dropping JS source, for Tarako branch, since source is useless for 
loaded module save for stringify functions.  (Gecko compress in-memory source.) 
But, I am not sure if it was landed on m-c then.


Note, this worked on B2G, but this would not work for Gecko. For example all 
tabs addons have to use toSource to patch the JS functions.


Source compressions should already be enabled.  I think we do not do it for 
small sources, and for Huge sources, as the compression would either be 
useless, or it would take a noticeable amount of time.


--
Nicolas B. Pierron
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: An analysis of content process memory overhead

2016-03-19 Thread Gabriele Svelto
On 15/03/2016 04:34, Nicholas Nethercote wrote:
> - "heap-overhead" is 4 MiB per process. I've looked at this closely.
>   The numbers tend to be noisy.
> 
>   - "page-cache" is pages that jemalloc holds onto for fast recycling. It is
> capped at 4 MiB per process and we can reduce that with a jemalloc
> configuration, though this may make allocation slightly slower.

We aggressively got rid of that on B2G by sending memory-pressure events
to apps that were unused. We did have the advantage there that we had
only one page per process so establishing if one was not being used was
very easy. On desktop Firefox we might consider to try and minimize the
memory usage of processes which do not have active tabs (e.g. none of
the tabs is visible, or none of the tabs has received input for a while).

Besides the immediate memory usage reduction this had the important
side-effect of reducing steady-state consumption. A lot of the
structures and caches that were purged had often been bloated by
transient data required only during startup. Once minimized they would
start to grow again once a process would become active again but never
as much as before the minimization.

 Gabriele



signature.asc
Description: OpenPGP digital signature
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: An analysis of content process memory overhead

2016-03-19 Thread David Rajchenbach-Teller
I seem to remember that our ChromeWorkers (SessionWorker,
PageThumbsWorker, OS.File Worker) were pretty memory-hungry, but I don't
see any workers there. Does this mean that they have negligible overhead
or that they are only in the parent process?

Cheers,
 David

On 15/03/16 04:34, Nicholas Nethercote wrote:
> Greetings,
> 
> erahm recently wrote a nice blog post with measurements showing the
> overhead of
> enabling multiple content processes:
> 
> http://www.erahm.org/2016/02/11/memory-usage-of-firefox-with-e10s-enabled/
> 
> The overhead is high -- 8 content processes *doubles* our physical memory
> usage -- which limits the possibility of increasing the number of content
> processes beyond a small number. Now I've done some follow-up
> measurements to find out what is causing the per-content-process overhead.
> 
> I did this by measuring memory usage with four trivial web pages open, first
> with a single content process, then with four content processes, and then
> getting the diff between content processes of the two. (about:memory's diff
> algorithm normalizes PIDs in memory reports as "NNN" so multiple content
> processes naturally get collapsed together, which in this case is exactly
> what
> we want.) I call this the "small processes" measurement.
> 
> If we divide the memory usage increase by 3 (the increase in the number of
> content processes) we get a rough measure of the minimum per-content process
> overhead.
> 
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: An analysis of content process memory overhead

2016-03-19 Thread Ben Kelly
On Thu, Mar 17, 2016 at 9:50 AM, Nicolas B. Pierron <
nicolas.b.pier...@mozilla.com> wrote:

> Source compressions should already be enabled.  I think we do not do it
> for small sources, and for Huge sources, as the compression would either be
> useless, or it would take a noticeable amount of time.
>

I think Luke suggested that we could compress larger JS sources off the
main thread if we implemented this bug:

  https://bugzilla.mozilla.org/show_bug.cgi?id=1001231

Its been in my queue for 2 years, unfortunately.  If anyone wants to make
that happen, please feel free to steal it. :-)

Ben
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: An analysis of content process memory overhead

2016-03-19 Thread Nicholas Nethercote
On Fri, Mar 18, 2016 at 2:29 AM, David Rajchenbach-Teller <
dtel...@mozilla.com> wrote:
>
> I seem to remember that our ChromeWorkers (SessionWorker,
> PageThumbsWorker, OS.File Worker) were pretty memory-hungry, but I don't
> see any workers there. Does this mean that they have negligible overhead
> or that they are only in the parent process?

I checked the data again: they are only in the parent process, so they
don't affect content process scaling. And they're not *that* big -- here's
the biggest I saw in my data (from the Mac "large processes" data):

> 6.33 MB (04.00%) -- workers/workers(chrome)
> ├──2.15 MB (01.36%) ++
worker(resource://gre/modules/osfile/osfile_async_worker.js, 0x113881800)
> ├──2.11 MB (01.33%) ++
worker(resource:///modules/sessionstore/SessionWorker.js, 0x1297e7800)
> └──2.06 MB (01.30%) ++ worker(resource://gre/modules/PageThumbsWorker.js,
0x1169c1000)

Nick
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: An analysis of content process memory overhead

2016-03-19 Thread Boris Zbarsky

On 3/17/16 9:50 AM, Nicolas B. Pierron wrote:

Note, this worked on B2G, but this would not work for Gecko. For example
all tabs addons have to use toSource to patch the JS functions.


Note that we do have the capability to lazily load the source from disk 
when someone does this, and we do use it in Gecko for some things.  We 
could use it for more things


-Boris
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform